THIRD NORMAL FORM FUNDAMENTALS ―ZERO DEFECT DATA DESIGN

June 13, 2017 | Autor: Enrique Villar Muñoz | Categoria: Functional Dependency
Share Embed


Descrição do Produto

THIRD NORMAL FORM FUNDAMENTALS ―ZERO DEFECT DATA DESIGN― Enrique Villar ATI #13639; ACM #4167456 C/ Mayor, 82 – 1ºD 30100 Murcia (SPAIN) +34 686 494 927 [email protected]

1.

INTRODUCTION

«The main objective of a relational database in Third normal form is protecting the designed tables from disruptive data redesigns caused by growth in database and, the application programs from costly change/enhancements by degraded performance or by growing traffic. Third normal form specialty is making the former with collections of tables easy to understand, to design and to maintain, simple to control and operate upon, and consequently informative for casual users, especially those of data warehouses» (Digest of [1971Cod]). "But how it all began?" [1988Cal]. All "from the very beginning" [1971Cod]. A true myth is surrounding, impalpable and resilient. We only can tell the myth another time to the new generation, playing variations on the same old melody. "Operations on sets are analogous to arithmetic operations: for every pair of sets A and B we shall form their intersection A∩B, and we shall understand by this the set of all elements common to the sets A and B and also forming the union A∪B, and we shall understand by this the set composed of all elements of set A and all elements of set B. It is clear that these properties do not depend on whether these sets consist on numbers, points or other mathematical objects; they are general properties of sets". K. Kuratowski (1961) [1961Kur].

1.1 The database predicate The data design of relational model specifies each R predicate with its natural structure, i.e. a "meaningful unit that is as small as possible" [1979Cod], "without superimposing any physical structure of machine representation" [1970Cod] to the Logical one, e.g. R (P, F, D, L, T, E, J) where R represents the predicate type. And P, F, D, … are the particular body (or intension) of R. The body is a business oriented "descriptive predicate" [1963Sch], for example, CITY (CityCode, CityName). In addition, each cell has a cool "atomic" (or nondecomposable) value [1970Cod], for instance, a number, a date, a word, a code. The ordering of attributes in the predicate does not carry information [1987S&K], “all information in the database" [1983S&B] is representable by a value in a cell of a table. Each instance of the predicate is a polyadic element (or row) of a database set, then the row ordering is "insignificant" [1979Cod]. Such polyadic set is "the relation" that goes varying accordingly the business reality varies, and it is giving name to the database model.

1.2 The predicate subject The uniqueness of the rows of the set holds at care of the subject. That is, each subject value is unique along its column every time. In this mathematical role, the subject is the 'primary key' that goes underlined, i.e. CITY (CityCode, CityName) that can be a "definite denotation" [1910R&W], e.g. DISTRICT (CityCode, DistrictNo, DistrictName). There is "one and only one primary key" [1990Cod] per relation for being the subject of the database predicate and the nuclear part of the coordinates of every datum [1985Cod], i.e. 〈Relation-name · Primary-key-name(value) · Attribute-name〉

1.3 SQL: Intergalactic data speak The declarative instrument to manage the informative rows of the database is SQL, an "intergalactic data speak" [1990Sto] ―more verbose than our Codd notation― but providing everywhere maximal independence and freedom: - Between the simple data definitions vs. the physical data engineering organization; and - Between the data definitions vs. the declarative set-oriented data manipulation in the programs;

1.4 SQL: WYSIWYG What−you−declare, i.e. table(s), attribute(s), target row condition(s) and ordering of the rows in your report, Is−what−you−get. All without mentioning any object of the physical lawyer as a diskpack or an SQL SPACE or an OS file being OS device type or compression option totally trasparent, too. The access paths, i.e. a "database index" (Wikipedia), an "inverted list" [1989SAG] or a "table materialization" [1980A&L], exist but "under the cover" [1970Cod]: A Query Optimizer [1976Ast], [1984J&K], [1987Fre], [1999W&A], [1999EVM], [2015Ora] takes care of selecting the quickest access path among the available ones including some path combination.

1.5 A very stable data design In other words, the relational model tries to capture the more static part of your business, letting the physical instances tuning, the application software tuning (until the expected steady state of both), and the physical planning to the system programmer and the database administrator (abbr. DBA).

1.6 R-DBMS walk park view "Typically, a database family would include: A database management system, a query facility, a logical database design aid, a physical database design aid, an application generator, and a data dictionary" [1983S&B].

1.7 Database dictionary "An important property of the relational model is that both the database and its description are perceived by users as a collection of relations. The same relational language can interrogate and modify both the database information and the database dictionary" [1990Cod].

1.8 Third normal form flash At 1975, Ian Palmer in its objective survey on Data Base Systems wrote the following: "Data redundancy is difficult to avoid in a conventional system of independent files and applications, in its turn, an integrated database implies the control of redundant data" [1975Pal]. "The relational model is a mathematical approach build around two basic concepts: The logical storage structure used is the Relation in Third normal form, which is the type of relation with the optimal properties for use in database; and the powerful Data manipulation language is an application of predicate calculus to operate on relations" [1975Pal]. "Either concept could be applied separately in the design of a database system; together they provide an original approach with great potential" [1975Pal].

1.9 Third normal form melody Codd introduced and discussed normalization of relations in 1971. His main goal was "developing the theory on database table design" [1990Cod] in the context of a full database schema. “Every database is intended to model some micro-world. Thus, the objects to which reference is made in the following list are those found in this micro-world” [1990Cod].

"Fueled by making the insertions, updates, and deletions clear in meaning and therefore easily understandable, the basic ideas in normalization are to organize the information in a database as follows” [1990Cod]: [1] “Each distinct type of object has a distinct type identifier, which becomes the name of a base relation; [2] Every distinct object of a given type must have an instance identifier that is unique within the object type; this is called its primary-key value; [3] Every fact in the database is a functional fact about the object identified by the primary key" [1990Cod]; [4] "Each such fact contains nothing other than the single-valued immediate properties of the object; [5] Such facts are collected together in a single relation” [1990Cod].

1.10 Data coupling vs. cohesive data "It is the coupling together of facts of different type that gives rise to redundancy problems" [1990Cod] in the tables being without attribute cohesion as summarized in the following table. Data coupling Cohesive data Partkey describes partkey Nonkey describes partkey Nonkey describes primary key Nonkey describes nonkey The formula "Nonkey describes primary key" —in dependency terms— is "All columns that are not part of the primary key are dependent on the primary key" [1990Cod].

1.11 Third normal form canon "The following facts are likely to be independent of one another with regard to their truth in the micro-world and, consequently, in their existence in the database" [1990Cod]: ☺ “Inserting a fact of one type does not require inserting a fact of another type at the same time” [1990Cod]; ☺ "Deleting a fact of one type does not require deleting a fact of another type at the same time" [1990Cod]"; and ☺ Changing the value of a fact [1999Poo] of one type does not require changing the value of the same fact in a non-empty set of rows of the R relation at the same time; and let alone in other tables.

1.12 Third normal form overture Third normal form (abbr. 3NF) starts with a set of relations whose attributes include references to primary keys of other relations, i.e. a database schema with referential integrity [1985Cod].

For instance, SCHEMA_D: {CITY (CityCode, CityName); DISTRICT (CityCode∊CITY, DistrictNo, DistrictName)}

1.12.1 Foreign key "CityCode∊CITY" is a foreign key in DISTRICT relation. A Foreign key (abbr. FK) is a single attribute of a database relation that also is primary key in another relation of the same database schema. The concept is original of the database Relational model (it is the basis of RM "Referential integrity" [1985Cod]). In Codd notation an attribute is a foreign key simply horning it with the sign '∊' [ElementOf] always accompanied by the name of its home set, i.e. an existing relation name, e.g. CITY, of the same schema, i.e. SCHEMA_D, of the current relation, i.e. DISTRICT.

1.12.2 Foreign key is a 3NF tool Horning CityCode with "∊CITY" switches on referential integrity mechanism, then: (1) The codes of the primary key of CITY are the source of the values of CityCode attribute; (2) The spelling of the current CITY only occurs one time in the database because only its code is part of the DISTRICT rows; (3) If the name of the CITY must be changed, it will occur in "one place"[1999Poo] without involving any other CITY row or some DISTRICT row. E. Villar THIRD NORMAL FORM FUNDAMENTALS

1.13 Third normal form refrain “The problem with database tables that are not fully normalized is that insertions, updates, and deletions can create unpleasant surprises for users because of anomalies in their behaviour and meaning” [1990Cod].

1.14 Third normal form nutshell Data design quality is conformance to requirements [1991Han]. Defect prevention is preferable to data review and correction [1991Han]. IF Zero defect data design THEN Zero defect data IF Zero defect data THEN Zero data redundancy. Third normal form radical solution from the beginning is NAME SPELLING OCCURS IN ONE PLACE & ONE BUSINESS PREDICATE PER RELATION Our "design methods tend naturally to produce normalized schemas" [1992BCN].

1.15 Third normal form environment Third normal form not only wants solving the defects, it has the vision that database design must switch from application-oriented files to business-oriented databases. Data are reflecting the more stable part of "{ALGORITHMS + DATA} = SYSTEMS" [1985Wir]. The programs then will be astringently changed or enhanced following very near the business needs. For having this feature, each 3NF table only has attributes that catch the known semantic of business subject, i.e. the identifier of the entity and all its known descriptive properties (if any) and nothing more.

1.15.1 Preserving database programs Adding new attributes to tables in 3NF, 2NF and 1NF does not impair database programs in any case (query, insert, change, or delete) according RM Physical data independence[1985Cod] and Logical data independence[1985Cod]. Conversely, programs suddenly ignoring obsolete attributes do not impair database service.

1.16 No attribute migration A part of reducing redundancy and its inherent disfunctionality, there was a third former motivation, i.e. minimizing the schema reorganizations, i.e. "attribute migration" [1971Cod] in Codd's words) coming from: 1) "Changes in the part of microworld of reference" [1971Cod]; 2) Introducing new controls, e.g. ownership of data, access authorization, recovery, etc.[1971Cod]; 3) "A relation can become too cumbersome in size and fuzzy in meaning on the basis of adding attributes to it" [1971Cod]. "Database reorganization of attributes is the obvious answer logically impairing application programs. Stabilize database and reduce attribute migrations accordingly, it is desirable. By casting the data base in third normal form the earliest possible time and by keeping it that way, an installation will reduce the incidence of attribute migration to a minimum, and consequently have less trouble keeping its application programs in a viable state" [1971Cod].

1.17 Cohesive software Let software development cost by source line be x programmer/time but software maintenance cost by line is 40x programmer/time more than the initial development cost according Yourdon & Constantine [1979Y&C]. This horrible disproportion fuels cohesive programs in structured applications [1979Y&C] from the beginning in order to save "the discovery, analysis, redesign, and correction of lurking bugs" [1979Y&C].

2016-01-2412:51:18 Page 2 of 76

In the converse way, the "attribute migration" [1971Cod] cost fuels cohesive relations, i.e. relations in 3NF, "from the beginning" [1971Cod] because "third normal form will significantly extend the life expectancy of application programs" [1971Cod].

1.18 Communication structure

This communication has 28 REPORTS (chapters 2 to 29): 1. INTRODUCTION 1 2. PREDICATE SERIES 3 3. DATABASE PREDICATE 4 4. DATABASE RELATION 7 5. DATABASE TABULATION 8 6. DATABASE DEPENDENCY 11 7. MAXIMAL INDEPENDENT SUBSET ERROR! BOOKMARK NOT DEFINED. 8. BASE RELATION 20 9. DATABASE INTEGRITY 23 10. DATABASE REDUNDANCY 25 11. IRREDUCIBLE UNF DATA MODEL 29 12. ZERO NORMAL FORMS 31 13. DESIGNER RELATION 35 14. DATABASE NORMALIZATION 37 15. THIRD NORMAL FORM 2.0 40 16. DATA DESIGN REVIEW 42 17. HOLISTIC DATA DESIGN 45 18. LEGO® DATA DESIGN 48 19. EXTREME NORMALIZATION 51 20. QUINE NORMAL FORM 52 21. ONTOLOGIC NORMAL FORM 55 22. INCLUSION NORMAL FORM 56 23. UNMARKED NORMAL FORM 56 24. LACONIC NORMAL FORM 58 25. OCCAM NORMAL FORM 59 26. UNION NORMAL FORM 60 27. NATURAL NORMAL FORM 61 28. DATABASE OPTIMIZATION 61 29. ACKNOWLEDGMENTS 63 30. REFERENCES 63 31. APPENDICES 66 32. ALPHABETIC INDEX 71

2.

PREDICATE SERIES

«The relational model is closely related to predicate logic» [1979Cod]. This report on database predicate is a series of meaningful Logical steps until database predicate establishment: • Hughes & Londey predicate definition; • Russell: xRy predicate; • Heijenoort: Incremental approach; • Frege: More than one subject; • Carnap: More than one property; • Maddux: A set of ordered pairs; • Codd: A set of polyadic predicates.

2.1 Hughes & Londey: PM predicate A predicate Pxy is any Logic expression which stands for a property x which a subject (thing or person) y can possess. This laconic definition starts the 'PM system' of Hughes & Londey [1965H&L]. 'PM' means "Principia Mathematica", i.e. the title of the seminal book of Russell & Whitehead [1910R&W] on this topic.

2.1.1

Predicate property

"The word property here is being used in a very wide sense. Not only those characteristics of things which would ordinary be said to be their qualities or properties, such as their color or their shape, but also e.g. their location or their activities or their habits, are rank as properties" [1965H&L].

2.2 Russell: xRy predicate

Russell looks for the principles of Mathematics demonstrating all the demonstrable assertions, arriving to one of such principles when a demonstration was not available. In this way, Russell put xRy Logic formula in the center of the PM system [1910R&W] dedicating five sections of first volume of Principia Mathematica to binary relations. xRy is a well-known Logic predicate describing an existing relationship [1910R&W], for instance: x greater-than y (x>y), x loves y (x♥y), x superset-of y (x⊃y), x pecks y (xPy), etc.

2.2.1

xRy represents a binary relation

The atoms x and y are individuals of same set which is called the universe U. The left-hand-side of R, i.e. x, is the referent subject; the right-hand-side, i.e. y, is the relatum subject [1903Rus]. A pair of two variables is represented by "(x, y)" but an ordered pair of variables is distinguished by "〈x, y〉". In this way, each of the ordered pairs 〈x, y〉 is an element of U².

2.3 Heijenoort: Incremental approach Heijenoort [1974Hei] has the first two incremental approaches on the Logic predicate. Heijenoort single out two basic forms of assertions: • Pxy means "the ascription of property x to individual y"; • xRy means "x subject bears the relation R to y subject” [1954Qui], instead of the cumbersome "The relation R of individual x applies to individual y" which could be the "Pxy" style of reading a simple xRy.

2.3.1

Subject and predicating roles

Also Heijenoort clarifies the roles of subject and predicating part of Pxy[1974Hei]: • The subject, i.e. y, identifies a case; • The predicating part, i.e. 'Px', sorts such case.

2.4 Frege: More than one subject Frege has the credit [1974Hei] of having introduced the first generalized subject-predicate Pxy form, that is, the one in which there is more than one subject. The enumeration of identified subjects: y1, y2… yn; and the ascription of the same property x to each of them. Frege's new formula is: Pxy¹y²y³ ... yⁿ For instance, "The Evangelists are Mark, Matthew, Luke and John" whose predicate formula is: Pxy¹y²y³y⁴. In a modern perspective, Px could be a class and each y subject, a member of the class: EVANGELIST: {Mark, Matthew, Luke, John}.

2.5 Carnap: More than one property Pxy with more than one property, i.e. Ptvwxy, is the image of a descriptive predicate [1937Car], and paraphrasing to Hugues & Londey [1965H&L], "A descriptive predicate Ptvwxy is any Logic expression which stands for an enumeration of properties {... t, v, w, x} which a subject (thing or person) y can possess". "A language is called logical if it contains only [logical symbols] otherwise descriptive" [1937Car]. And Carnap in The Aufbau (1934) coined the term, e.g. “An expression is called descriptive when either a descriptive predicate or a descriptive functor occurs in it” [1963Sch]. Carnap has a clear contribution to the evolution of Logic predicate introducing ‘descriptive functor’ and ‘descriptive predicate’ terms. Let define Descriptive Functor and Descriptive Predicate, just for the purpose of a fluid reading of this sucint evolution of Pxy Logic expression.

«Bertrand Russell described several interesting operations on binary relations in the Principia» [1972Cod].

E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 3 of 76

2.5.1

Descriptive functor

"In order to express properties or relations of position by means of numbers, we shall use functors. For instance: let 'te' be the temperature functor; 'te(3)=5' then means: "the temperature at the position 3 is 5"; if we take the functor 'tdiff' to represent temperature difference, then 'tdiff(3,4)=2' means "the difference of the temperatures at positions 3 and 4 equals 2" [1963Sch]. In general, a descriptive functor transforms a definite expression in its definiens, e.g., WIDOW ("Gustav Mahler") gives "Alma Schindler" [1986Dea]; FirstName ("Roger D. Maddux") gives "Roger"; FLAGTYPE ("French Flag") is "tricolor"; SIN (90°) gives 0; SQRT (9) gives ±3; etc.

2.5.2

Definite denotation

The definite denotation of Russell [1910R&W], e.g. "the widow of Gustav Mahler", is a clear precedent of the more formal and general descriptive functor of Carnap. In our case, we will consider definite denotations as an informal antecedent of descriptive functors and clearly oriented to denote subjects.

2.5.3

Descriptive predicate

A descriptive predicate is a pragma containing an unique subject and a set of attributes either describing directly the predicate subject or indirectly by the mean of a descriptive functor. Even the own subject can be characterized by a functor [1986Dea], in which case, definite denotation also applies. Please, find, a formal view of database denotations in Characteristic primary key.

2.6 Maddux: A set of ordered pairs "The Axiom of Unordered Pairs allows the construction of an ordered pair of sets according the well-known definition of Kuratowski [1920Kur]" [2006Mad]. "Binary relations are defined as classes of ordered pairs" [2006Mad]. The ordered pairs 〈x, y〉 as elements of U² can be represented by a "Peirce matrix" [1883Pei] as follows. "Let A, B, C, D, etc., be all the individual objects in the universe; then all the individual pairs may be arrayed in a block" [1883Pei], thus:

Figure 1. Peirce matrix (1883) "There are axioms that guarantee the existence of relative products and converses of binary relations" [2006Mad]. "The membership relation axiom allows the construction of the identity relation on sets" [2006Mad]. The membership binary relation —under the label ElementOf— has its own entry as part of DATABASE PREDICATE report, .

2.7 Codd: A set of polyadic predicates Codd [1979Cod] interprets its seminal work on "large shared data banks" [1970Cod] as a part of the Logic, that is, the intension of the

RM predicate is polyadic and unordered or, in mathematical terms, the RM predicate is a set of attributes. The flat structure of a polyadic predicate facilitates the objective of forming a set of instances of the predicate in a relation of a database. Please, let 'RM predicate' and 'database predicate' be synonyms.

2.7.1

Predicate intension

The predicate intension is the formula "P(...xy)" [1959Nic] corresponding to a given database predicate. The intension of RM predicate is the set of attributes, including the subject and his/her/its properties as already stated below in Hughes & Londey: PM predicate section. E. Villar THIRD NORMAL FORM FUNDAMENTALS

2.7.2

Predicate extension

The predicate extension [1959Nic], i.e. P*, is the set of instances of current P: «P*: {(...xy)¹, (...xy)² ... (...xy)ⁿ}» [1959Nic].

2.7.3

Subject of a polyadic predicate

The subject always exists, occupies one of the attributes and it is known as "the primary key" [1970Cod]. The subject plays the important role of identifying its row uniquely because having itself unique values along its column. Being the attribute ordering "insignificant" [1979Cod], the subject always goes underlined. R(x¹, x², x³, ... xⁿ, y) The remainder attributes (if any), i.e. the predicating part: (x1, …, xn), is a set of properties describing the subject.

3.

DATABASE PREDICATE

Let call "database predicates" to the Codd's polyadic predicate instances clearly oriented to be part of a set in a relational database. This report continues the own description of this original predicate with the following sections: • The normal form; • Database predicate; • ElementOf; • Predicate-predicate connections; • Undisciplined predicate; • Disciplined data design; • Disciplined predicates; • Components of RM predicate; • Database predicate structure; and • The five RM/T disciplined predicates.

3.1 The normal form "The normal form" [1970Cod] is the original name of the First normal form [1971Cod] (abbr. 1NF). "A relation R is in First normal form, if it has the property that none of its domains has elements which are themselves sets" [1971Cod]. The 1NF of this section is the pristine definition of a relation as part of database field. At that time the important thing was "Please, be relational, let the mix of logical and physical design and enter in the pure world of the functional design". In this section, E.F. Codd (1970) writes with its own words on "the normal form" of the database predicate. Although the ordering and subtitles are not of the own E. F. Codd: • Logical & stable data design; • Column & rows vs. physical stuff; • Unique identifiers; • Logical definition language; and • Say what you want not how.

3.1.1

Logical & stable data design

"In order to discuss a preferred way (or normal form), we must first introduce a few additional concepts (active domain, primary key, foreign key)" [1970Cod]. "A relational model of data is proposed as a basis for protecting users of formatted data systems from the potentially disruptive changes in data representation caused by growth in the data bank and changes in traffic" [1970Cod].

3.1.2

Column & rows vs. physical stuff

"A relation whose domains are all simple can be represented in storage by a two-dimensional column-homogeneous array" [1970Cod]: "(1) It would be devoid of pointers (address-valued or displacement-valued): (2) It would avoid all dependence on hash addressing 2016-01-2412:51:18 Page 4 of 76

schemes; and (3) It would contain no indices or ordering lists" [1970Cod].

3.1.3

Unique identifiers

"Normally, one domain (or combination of domains) of a given relation has values which uniquely identify each element (n-tuple) of that relation. Such a domain (or combination) is called a primary key. In the example above, part number would be a primary key, while part color would not be" [1970Cod].

3.1.4

Logical ℝ definition language

"ℝ permits the declaration of relations and their domains; ℝ identifies the primary key for that relation and its foreign keys" [1970Cod]. "If the user’s relational model is set up in normal form,names of items of data in the data bank can take a simpler form than would otherwise be the case" [1970Cod].

3.1.5

Say 'what' you want not 'how'

"A first order predicate calculus suffices if the time-varying collection of relations is in normal form" [1970Cod].

3.2 Database predicate What constitutes the database predicate? Firstly, "a predicate calculus formula Pab...z which is relatively stable over time (this is sometimes called the intension)" [1979Cod]. Database predicate is called 'relation' and it is using the 'R' of the xRy formula of "binary Relations" predicates. Nevertheless, it follows a new form —R(a, b, c ...)— that is the form of the polyadic predicate Pxⁿ,…x¹y. Being the intension of R an undefined but finite set of attributes, i.e. not exclusivey binary but polyadic then: The database predicate only can be a sophisticated version of PM predicate [1965H&L] whose instances (the rows) are clearly oriented to share a set, like the xRy predicate easily do [2000Mad].

3.2.1

PM identity of database predicate

The database predicate instances, i.e. a database row, has the same identity of PM predicate [1965H&L], i.e. it can be TRUE or FALSE, otherwise it is not a PM well-formed formula [1965H&L]. In PM Identity appendix, please, find its short entry. Nevertheless, the database rows, i.e. the instances of database predicates, having an individual empirical component in reality — as the binary relationships— will be every time TRUE [1910R&W] when it will be evaluated as part of a discourse or an argument. Obviously, only is TRUE every item of every RM database whose value corresponds with the business reality.

3.2.2

Data integrity

The correspondence between every given database item and the business reality is at care of DATA INTEGRITY field. A DATABASE INTREGRITY report is part of this communication.

3.3 ElementOf ElementOf is a predicate, whose symbol is '∊' as in x∊A, the variable x represents an element and the variable A is the name of the set. For instance, we write "2∊ℿ" and we read "2 is an element of the set of prime numbers". In RM Logic, the attribute name and the table name are variables. In the present case, we write "R.P∊B" and we read, "P is a foreign key in R and P is the primary key in relation B". This foreign key Logic syntax is very compact and allows an easy FK documentation as part of the known Codd FD specs.

3.3.1

ElementOf is part of FD theory

ElementOf has a great Logic value for its smooth utilization in both sides: • As part of the database predicate; and • As part of Functional Dependency (abbr. FD) substructures. E. Villar THIRD NORMAL FORM FUNDAMENTALS

3.3.2

ElementOf closes a semantic gap

Nevertheless, the best news is that ElementOf closes an existent semantic gap when breaking up a dependency structure in several substructures. Also at FD substructure interpretation as a 3NF entity type. The representation principle [1982Zan] runs without forcing any Logic thanks to ElementOf. It is true that being disjoint [1977Fag] and minimal al the multiattributes of R helps, too.

3.4 Predicate-predicate connection The RM predicate is a polyadic predicate whose variables are illustrated, e.g. R (P, F, D, L, T∊B, E, J), and they are known as 'attributes' or 'columns', where: − "R" is the name of the set {P, F, D, L, T, …}; − "P" is the subject of the predicate R; − "T∊B" means that T is a foreign key whose home set is the B relation. Each Codd's descriptive predicate being part of a set can connect with other predicate instances of other sets. This is done using the membership relation as part of attribute specs. During the design of the intension of R, the Peano's notation used by Codd [1970Cod], i.e. S(s, x∊R, etc.), fulfils this important role.

3.1 Undisciplined predicate "Undisciplined application of the predicate calculus in designing a database could yield an incomprehensible and unmanageable pool of assertions" [1979Cod].

3.1.1

Convoluted predicate

Let instantiate as R(F, D, L, T∊B, E, J, P) the standard polyadic predicate formula of a relation, i.e. R(x¹, x², x³, x⁴, x⁵, y). Data designers were unaware on the possible semantical digressions introduced into the formal descriptive discourse, for instance, when the J attribute were a property of E attribute yielding the following formula: R′(x¹, x², x³, [(x⁴) x⁵], y). Grammatically, such digression —represented by "…, [(x⁴) x⁵], …"— is a "relative clause" (Wikipedia), i.e. a subordinate second level of discourse which also is a new predicate inside the original one. Such innocent switch of E attribute being also subject of a property of attribute J, transforms the R predicate from a clear well-formed formula (abbr. wff) of PM system in a descriptive pragma containning two descriptions that in the form of Dependency Grammar [1970Rob] could take the following interpretation. DB predicate Predicate interpretation R(x¹, x², x³, x⁴, x⁵, y) R{(x¹, x², x³, x⁴)↢y} R′(x¹, x², x³, [(x⁴) x⁵], y) R′{(x¹, x², x³, [(x⁴)↢(x⁵)])↢y}

3.1.2

Impairing the PM subject unity

"R: {(x¹, x², x³, x⁴)↢y}" means "all columns that are not part of the primary key are dependent on the primary key” [1990Cod]. In the other side, "R′{(x¹, x², x³, [(x⁴)↢(x⁵)])↢y}" represents "one governor" [1970Rob] which is y and now there is a new subject which is x⁵, too. "R′{(x¹, x², x³, [(x⁴)↢(x⁵)])↢y}" means "all columns that are not part of the primary key are dependent on the primary key” [1990Cod], "(x⁴)↢(x⁵)" means "column x⁴ depends on x⁵ column", besides "([(x⁴)↢(x⁵)])↢y" means "columns x⁴ and x⁵ are dependent on y". R′ introduces a relative clause in the formal discourse of the R descriptive wff, impairing the PM subject unity. Then R′ becomes less clear than R. And not only that, probably the same predicate R′ is no longer a wff.

2016-01-2412:51:18 Page 5 of 76

3.1.3

Undisciplined data design

In the context of Relational model, the meandering predicate formally consists in two data design practices when designing the 'nonkey' [1983Ken] attribute set: • Creating R with a nonkey attribute having a property of a part of a composite candidate key [1983Ken] is a "Partial Dependence" [1971Cod] (abbr. PD); • Specifying R with a nonkey attribute describing another nonkey attribute [1983Ken] is a "Transitive Dependence" [1971Cod] (abbr. TD).

3.1.4

Transitive key dependence

Simplifying "([(x⁴)↢(x⁵)])↢y", we get "x⁴↢x⁵↢y" whose formal interpretation is {"x⁴↢x⁵ AND x⁵↢y"} (1) but (1) is the left hand side of IF {x⁴↢x⁵ AND x⁵↢y} THEN {x⁵↢y} (2). And (2) is the pattern of the transitive property formula of the Functional Dependency (abbr. FD) [1971Cod] binary relation (see: [1903Rus] and [2000Mad]). The format of this holistic formulation is part of the Absorption Laws of FD 2.2 Boolean algebra of x report.

3.2 Disciplined data design The previous formalization is part of an important Codd's R&D project looking for the causes of database redundancy before 1970, because it merits the following lines in the seminal paper of that date: "Further operations of a normalizing kind are possible" [1970Cod] upon the "normal form". He invented two more. The second normal form is free of partial dependence, the third is free of partial and of transitive dependence.

3.3 Disciplined predicates "Early in 1979" [1990Cod], Codd presented in Hobart (Tasmania) an "extended version" of "the Database Relational Model to Capture More Meaning, naming the extended version RM/T (T for Tasmania)" [1990Cod]. There Codd were looking for "meaningful units that are as small as possible" [1979Cod] and he presented four categories of predicates that we know as "RM/T entities". Besides, such four RM/T categories are cohesive predicates whose instances continue sharing a polyadic set.

3.3.1

RM/T predicates are in 3NF

The RM/T semantic research redresses the possibility of mentioned such excursuses with the lemma "meaningful units that are as small as possible" [1979Cod] that really means "Irredundant meaningful units". Even more, Codd [1979Cod] offers four positive ways of directly design in third normal form. From the Logician point of view, this step is an imaginative and interesting advance. But from the point of view of data designers, database administrators and database daily operation people, it is a big deal.

3.3.2

Zero defect data design 2.0

The time of zero defect during data design restarts with the advanced predicate system of RM/T. Now, also fueled by 'Zero defect data' [1991Han] persistent target, whose better instrument always was designing in Third normal form "from the beginning" [1971Cod].

3.4 Database predicate structure The Third normal form for Relational databases is based on: • A mandatory Primary Key (aka "the Key", the row "Identifier", the predicate Subject) that articulates: • An optional Denotative part, i.e. a set of Alternate Key attributes (aka Minor-key domain); and • An optional Descriptive part, i.e. a set of independent attributes (aka Nonkey domain);

E. Villar THIRD NORMAL FORM FUNDAMENTALS

The database predicate articulation occurs in the following simple way: [Denotative part] 〈SUBJECT〉↣[Descriptive part]

3.4.1

The subject

As stated, the predicate subject is the mandatory component. The subject accupies an attribute where it is a person or a thing but a definite denotation of a characteristic entity needs two attributes and an associative entity subject is the concatenation of an attribute per participating subject.

3.4.2

Descriptive part

Each attribute which is not part of the primary key is a functional aspect of the primary key. Such descriptive part of a predicate is technically known as 'nonkey domain' of R. Each attribute which is part of the nonkey domain is a 'nonkey attribute' [1983Ken]. The set of descriptive attributes is a consistent vector of "orthogonal components" [1977Ris]. In other words, each descriptive attribute describes the subject as if the subject always were preceding it and if the other descriptive attributes were transparent. Descriptive attributes are necessarily optional, for instance, a binary relation (which also is a predicate) is described just with two subjects. Trying to describe, as part of the association, some particular property of the participating subjects would be a database redundancy.

3.4.3

Denotative part

Each attribute or "multiattribute" [1977Ris] able to identify each row of R is a candidate key. Letting a part the primary key ―which already was a candidate key―, each actual candidate key is a definite denotation [1910R&W] of the primary key; a single candidate key is a denotative attribute of the RM predicate. The set of attributes of current candidate keys is the 'minor-key' domain of R and each element of minor-key domain is a 'key attribute'. If a key attribute is part of a composite candidate key, such key attribute also describes the primary key. The denotative part of database predicate is optional.

3.5 The four + one RM/T predicates «After a review of the relational model, we introduce a classification scheme for entities, properties, and associations» E. F. Codd (1979) [1979Cod].

3.5.1

Class predicate

Class predicate "is not existence dependent on any other entity" [1979Cod] and it is of only one standalone attribute which is the primary key.

3.5.2

Kernel predicate

Kernel predicate matches our known descriptive predicate: A subject and the set of attributes corresponding to the subject's properties. It can be the seed of a characteristic hierarchy or participate as one of the subject of an associative predicate.

3.5.3

Characteristic predicate

Characteristic predicate "fills a subordinate role" in describing a superordinate individual by a subordinated set of its own predicate instances. The subject of a characteristic entity is an ordered denotation of two parts. Characteristic entity typically is part of a Characteristic hierarchy.

3.5.4

Associative predicate

The associative predicate is a two places predicate like xRy, i.e. 〈x, y〉, but also recognizing the internal existence of the converse relation, i.e. 〈y, x〉. The two relative subjects are foreign keys and both are part of the associative primary key. 2016-01-2412:51:18 Page 6 of 76

3.5.5

Denotative predicate

4.1.3

A denotative predicate implements an standalone one-to-one ERA relationship in the form of a relation RS between the relations R and S with two subjects, being the first the identifier of R and of RS, i.e the primary key of RS is a foreign key, and the second a denotative part, i.e. an explicit foreign key of S also being an strong candidate key of RS. Denotative predicate originally [1979Cod] was implictly part of Associative predicate but, currently, it is necessary to taking it into account as the RM/T counterpart of one-to-one ERA relationships. Obviously, denotative predicate RS is existence dependent on R and on S predicates.

4.

DATABASE RELATION

This report is a presentation of the three original concepts of a Database relation. The relation named R is the protagonist of relational database field and of all the reports of this communication. The section "Relation, relational and relative" explain why we need 'relative' adjective. In the other three sections, we try to take advantage of E. F. Codd [1970Cod] and [1971Cod] papers unifying his own ideas. The sections of DATABASE RELATION report are the following: • Relation, relational, relative; • Attribute, domain, datatype; • Relation on domains; • Relation on attributes; and • Relation on tuples.

4.1 Relation, relational, relative Let explain how we have deal with the ambiguity of some English words which are the "bread & butter" of relational database continent discovered by E. F. Codd [1970Cod] until arriving to call "legacy database" to the databases and its Database management systems (Abbr. DBMS) existent before 1980.

4.1.1

Relation noun

Database relation is a list of homogeneous instances of same concept along each of its columns and having a column header with the concept name of each column. A database relation takes the form of a grid report, for instance, "Relation of applicants admitted", the Enrolment of a ship or a delivery note of a grocery in 1954 (as the following).

'Relation' and 'Table' are synonyms, both means "database table'. "Database relations" is a set of bureaucratic tables. "Database relation" concept has nothing in common with the "human relations" or the "binary relation" between two persons, two entities, or two things.

Relative adjective

The Peircean adjective for the binary relations is relative[1881Pei] that applies to logic connectives, to set operations [1920R&W] on binary relations and to the subjects of binary relations. For instance, relative product, relative algebra, relative matrix [1883Pei]. This Peircean adjective allows us speaking unambiguously of the part of the PM system dedicated to binary relations — which normally is a conspicuous part of Relational Logic. For instance, the Functional Dependency is a binary relation, having the relative property of transitivity, and playing an insightful role in the relational theory of Dependency, the RM/T associative entity is a set of relative predicates, etc.

4.2 Relation, attribute, domain, datatype In R (P, F∈C, L∈B, T∈B, E, J), R is a variable for any relation[name] in the database. In the Logical data design a relation is the name of a set of attributes [attribute-names] able of holding the functional information, i.e. a set of values in a row, of an individual or thing in the reality. Every attribute[name] has its source of values which is its domain[name]. If the domain is at hand, e.g. P∊B, the attribute is called referencing attribute. The cited domain is the home set (or homeset) of the referencing attribute. If the domain is not cited, e.g. B of P∊B, the attribute is called immediate attribute. In this case, its values come from one of r-DBMS built-in domains, normally known as data type (or datatype).

4.2.1

Coordinates of a database cell

Each of the binomial Attribute(value) is a database cell (or datum) whose coordinates are [1985Cod]: 〈table, primary-key(value), attribute〉

4.3 Relation on domains Given a relational schema S: {D¹, D², D³ … Dⁿ, R} [1970Cod]: (1) R relation is a member of S schema; (2) R (P, F∊D³, L∊D¹, T, E, J); (3) R also is the client of the domains of S, i.e. D¹ ... Dⁿ; (4) The subset S is an explicit domain-subschema; today, we would say that each D¹, D² and D³ were a 'kernel entity', e.g. the table of Postal codes of a given country; (5) Each attribute always has its domain that can be shared by other attributes, be them of R or of another relations; (6) If the domains can be ignored, such a relation will be represented as R(P, F, L, T, E, J).

4.3.1

Built-in domain

If there is not a true domain as source of scalar values for a given attribute, it will take its values from an implicit range of values offered by the relational DBMS product. Each of such implicit domains is a built-in domain known as data type (or datatype).

4.4 Relation on attributes

[1954Nes]

4.1.2

Relational adjective

'relational' adjective is exclusively "database oriented" following to Codd [1970Cod]: relational model (abbr. RM), relational constraint, etc.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

"Now, instead of depending on the ordering, we use a distinct name for each domain and for each domain citation" [1971Cod]. "And each of these citations is called an attribute of R" [1971Cod]. For example, a relation has six attributes R (P, F, L, T, D, J) while the corresponding domains are R (P, F∈C, L∈B, T∈B, E, J). The basic specifications of a relational table is its set of attributes; being R the name of such relation and the name of the set of attributes; the set of attributes is known as the predicate intension of R [1910R&W].

4.4.1

Surface and deep structures

In other suggestive paradigm, the flat set of attributes could be understand as the surface structure [1957Cho] of R, and the corresponding deep structure [1957Cho] is:

2016-01-2412:51:18 Page 7 of 76

• • •

The mandatory and minimal primary key(1) followed by two optional substructures: The subset of descriptive attributes (2); and The set of minimal candidate keys (3).

4.5 Relation on tuples "The elements of a relation are called tuples" E. F. Codd [1970Cod]. This is the main contribution of Codd to the Logic, i.e. defining a set of irredundant descriptive predicates [1979Cod] with the details adapted to the Logic field, e.g. “All information in the database is represented as values in tables" [1983S&B], together the specialties of the new "relational data banks" field [1970Cod]. R* is a set of tuples whose elements are "atomic values" [1970Cod] — in other words, each attribute value is "nondecomposable" [1970Cod]. R*: {(p₀, v₀, d₀, f₀, t₀ …); (p₁, v₁, d₁, f₁, t₁ …); (p₂, v₂, d₂, f₂, t₂ …); …………………………………………… …}. R* is the extension of R at a given time [1971Cod]. In other words, the own extension is a time varying set of R rows [1971Cod]. The R extension varies exactly as a mechanical reflex of business reality. At its own rhythm, the Database follows its microcosms. Database is varying because the reality is varying. For instance, the extension of a relation R of a production database, after a production cycle, has suffer changes in the form of: ― new rows (probably including an Initial load—see below— performed by a database administrator); ― changing some initial cell values; and ― deleting some entire rows. The state of R, i.e. R*, is the R extension at a given time. The set of tuples of R is a subset of the Cartesian product of the possible values of all the attributes [1970Cod], i.e. R* ⊊ {P × F × L × T × E × J}. Such subset captures the state-of-affairs of R at a given time. There is no a shortest expression to say that.

5.

DATABASE TABULATION

The sections of DATABASE TABULATION report are the following: • Example of tabulation; • Tabulation map; • Tabulation body properties; • Spare candidate key; • Built-in domain; • Varchar & blob RM value; • Datatypes for tabulations; • Probing 3NF tabulation; • Cooking R tabulation; • Gallup tabulation; • Catalog of tabulations; • Standardizing R tabulation; • Tabulation oriented DML; and • 3NF mathematical proof.

5.1 Example of tabulation Our example is a tabulation for the PANEL (Pilot, Fly, Destination, Time, Date) predicate showing an assignment panel at a given USA airport.

5.1.1

Original ASSIGN credits

We are in debt with David Maier for the original ASSIGN*(Pilot, Flight, Date, Departs) [1983Mai] that inspired the PANEL tabulation. Pilot Flight Date Departs Clark 281 08 Aug 05:50am Copely 281 09 Aug 05:50am Cushing 083 09 Aug 10:15am Cushing 116 10 Aug 01:25pm Clark 083 11 Aug 10:15am Chin 116 12 Aug 01:25pm Clark 301 12 Aug 06:35pm Chin 083 13 Aug 10:15am Copely 281 13 Aug 05:50am Copely 412 15 Aug 01:25pm

5.1.2

Tabulation of an airport panel

The tabulation is a tool of the Art Of Normalizing Tables "from the beguinning" [1971Cod] . A relation [1979Cod] R consists of a set of tuples at a given time, each tuple having the same set of attributes. From a Logic point of view, the set of tuples is the table extension (or state) and the set of attributes is the basic intension [1971Cod] of this set of rows. In its turn, each intension has its corresponding descriptive predicate formula. Discovering the deep sentence starts "given a fair corpus of sentences" [1957Cho]. In Chomskian fashion, "Discovering the dependency structure of R starts given a fair corpus of rows of R" which is called TABULATION of R. The R extension of a database tabulation for being representative must be RM compliant, i.e. having a primary key matching the corresponding ERA entity specs. If the extension also has other constraints (a part of some candidate keys) —in the form of redundant nonkey attributes—, they will be disclosed (and "repaired") by the normalization procedure. An explicit reference, i.e. L∊S, can be part of transparent attribute properties. All candidate key constraints must be discovered in order to confirm the ERA identifier or ―by the way― switching to a new primary key.

PANEL tabulation Following a week cycle, every day the assigner refreshes the current panel selecting a pilot for a flight (having a destination and a depart time).

5.1.3

Restrictions on PANEL

The following restrictions hold on PANEL: 1. "For each flight there is exactly one time" [1983Mai]; E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 8 of 76

2. "For any given pilot, date, and time, there is only one flight" [1983Mai]; 3. "For a given flight and date, there is only one pilot"[1983Mai]. We have added: 'Destination' attribute, and some new rows in order to mimics a week cycle (around the dates giving by Maier) then we will discover new restrictions.

5.2 Tabulation map Each relation R may have a designed tabular representation (known as 'tabulation' as stated): — Each tabulation has the same name of its future table in production; R here is a variable representing any table underdevelopment as a software deliverable. — The first row is the header having the set of assigned and distinct attribute names of R [1971CEF] where some of them can be a foreign key [1970Cod], e.g. R.L∊S, referencing another relation, i.e. S relation. — Being exact, S references the active domain of the primary key of S without mentioning its current name. — The remainder rows contain the current tuples of R and the set of rows is the body of the tabulation.

5.3 Tabulation body properties The tabulation body of rows is a rectangular array [1971CEF] with the following properties: P1: The tabulation is column-homogeneous —in other words, in any selected column all its body cells are of the same kind [1971CEF]. P2: Each body cell has a value which is a simple number or a character string [1971CEF] —in other words, all body cells are "atomic values" [1970Cod]. P3: All rows of a table must be distinct, i.e. there is no duplication of rows [1971CEF]. P4: The ordering of rows within a table is immaterial [1971CEF] —in other words, body row order is insignificant [1979Cod]. P5: The ordering of columns within a table is immaterial [1971CEF] —in other words, attribute order does not carry information [1987S&K]. The body tabulation properties are the same properties of any relation in production.

5.4 Built-in domain Currently a built-in domain is known as data type (or datatype). This section present some of the current datatypes. According datatype performance as primary key component, we find: • Prosaic datatypes; or • Non-prosaic datatypes.

5.4.1

Prosaic datatypes

A prosaic datatype has its corresponding literal [1985Cod], consequently, it is a valid datatype for the primary key. Some examples of prosaic datatypes follow [2008IBM]. Data type Literals 'A' 'b' '7' '§' '32' '12/14/1985' "19-03-1945" CHAR "prime number" 'DON''T CHANGE' X'FFFF' '' 1 +7 1800.5 64 −15 32767 −15. 025.50 1000. NUMERIC 2.E5 −2.2E−1 +5.E+2 +375893333333333333333.3315E1 Data type DATE TIME

5.4.2

Literals 1987-10-12 10/12/1987 12.10.1987 1987-10-12 1:30 PM 13.30.05 13:30:05

UOM of quantity column

It is worth noting that quantity columns —from a 3NF point of view— must be homogeneous in its Unit-Of-Meassure (abbr. UOM) [1990Cor]. However, there is not current SQL way enforcing the UOM of quantity columns. This enforcement continues being an application duty. E. Villar THIRD NORMAL FORM FUNDAMENTALS

5.4.3

Non-prosaic datatypes

An attribute whose datatype is without literal syntax cannot be part of a primary key (abbr. PK). Seemingly, the following attribute types (or similar) cannot be part of a candidate key: i. BLOB (images, maps, etc.) ii. Variable length attribute whose literal were bigger than the available space in an standard input screen; The reason of this PK rule is that any row of an RM database must be identifiable by the triple (T, K, v); where T=, K= and v= [1985Cod]. This rule is part of the RM Entity Integrity for r-DBMS engines.

5.5 Varchar & blob RM value The structured datatypes (as VARCHAR and BLOB) play an important role as hyper-descriptive attributes. Overall in the internet servers but also in Marketing, Library, Magazine and Newspaper shops for a while.

5.5.1

Varchar & blob structures

The fact of having an internal structure is essential for the hyperdescriptive performance of these datatypes. Their internal structure follow. - VARCHAR ≝ (, ); - BLOB ≝ (, , ); [examples of blob method are: .pdf. .doc, etc.].

5.6 Datatypes for tabulations During TEST phase, each column has uniform values of the three implicit datatypes available for 3NF tabulations: • ALPHABETIC for columns having names as Name and FirstName; uniform {lowercase↓UPPERCASE}; • NUMERIC for columns having numbers (as Quantity); • ALPHANUMERIC for columns with codes (as Project and PartSupplier); use {lowercase↓UPPERCASE};

5.6.1

Missing and inapplicable marks

According irreducible 3NF protocol of UNMARKED NORMAL FORM, in the tabulations missing or inapplicable marks are represented by (IS NULL WITH DEFAULT '␢').

5.7 Probing 3NF tabulation Designing a 3NF intension of an entity, i.e. its set of cohesive attributes, is considered here just the first step because we can also including a set of rows with the intention of probing that the set of attributes really is in third normal form, i.e. free of the design defects whose side-effect is the data redundancy. This feature allways has been associated to the papers of the field but as design task never has been part of the good practices of normalization. At the moment, even accepting the design of the set of rows as a proof of being R in third normal form is not part of mainstream. We share this recipe for Cooking an R tabulation "AS IS". It is a probing tabulation for an R entity being of 3NF class B specs (before a table review). The result of the recipe is an R tabulation probing that R entity is 3NF compliant. Any reported bugs and CE's are welcome. The proof has the final form of a tabulation but the recipe is a procedure for designing and assembling the minimal tabulation rows probing that the designed entity is in 3NF. The main interest of this tabulation is opening a 3NF branch in the core of Database Review. The first step is designing 3NF entities "from the beginning" and it is already represented by the axioms of LEGO DATA DESIGN and the patterns of HOLISTIC DATA DESIGN.

2016-01-2412:51:18 Page 9 of 76

"Designing a probing 3NF tabulation" (being part of Table Review) is the second step. Nevertheless, the promise of having a mathematical proof of being R in third normal form (also "from the beginning") is more appealing for the writter. Let's go.

5.8 Cooking R tabulation 1) Create the palette of every domain; reserve them as part of the descriptive schema of R ―which never will be an orphan entity―. 2) Count the values per domain. 3) At least, you will need the same number of rows than the number of intervening columns of R plus two more new rows. Or a number of rows equal to the biggest cardinality of a nonkey attribute plus two rows. Whatever of both be the biggest. 4) Create n rows accordingly; start with the columns ordered in business order and fill in only the primary key and the candidate keys; 5) Order the rows by primary key; 6) Reserve the business ordering of the columns; 7) Reserve the candidate key columns; 8) Arrange the order of columns (left to right) according the cardinality of mentioned active domain. Now the ordering of columns is from biggest active domain at left and the smallest at the right end. 9) Fill-in the blank cells of the table using all the values of each domain; 10) Copy the last row as a new row with a different PK. Change the value of a cell of any small active domain. In this way, you have an almost duplicated row. Now, there is no possibility that the descriptive subset were a big candidate key. 11) The next step is having one duplicated value per column. After this task is finished, please, order the rows at random (except the twin rows); just for the aesthetic question of having not all the duplicated values in two closed rows; 12) From right to left and from bottom to top: Fill in the left bottom cells (over the last two rows) with the value of its upper cell. 13) And so on, until the cell in the top line must rest unchanged because it is the primary key column. 14) Recover the alternate key columns; 15) Recover the functional ordering of the all columns; 16) Mark as foreign keys all the attributes whose domain is not a Boolean, e.g. the Sex attribute. 17) Arrange domain values in the R entity in such a way that half of the active domains were not using all the possible values. If you do not like editing R, add a new entry to the selected half of domains. These final retouches are discarding having a R being a Cartesian product instead of a subset of it. These retouches are important creating an associative entity or a denotative entity.

(Step5) DOMAINS: Get the current active domain of each extracted column; (Step6) DISGUISE: For protected data, disguise the alphabetic identifiers.

5.10 Catalog of tabulations Each Functional Analysis contains its own "Catalog of tabulation models" in "first normal form" which is the List of Functional Reports: All of them pending of being populated with 1NF rows! The normalizer would take advantage populating the tables at care of her&himself with either of the methods explained before.

5.10.1 The normal virtuous circle Therefore, {Functional analysis + normalization} form a virtuous circle around a given set of functional entities: {(Functional Entities specs) → (Report layout) → (Report attribute set) → (NORMALIZATION) → (subschema of 3NF tables) → (SQL create tables) → (Functional Entities specs refinement); [exit of refinement cycle] AT exit of this cycle, Functional Entities specs are completed; and the next step could be creating the SQL VIEW of each report. The subschema of 3NF tables has a feed-back effect on the documentation of the previous step, i.e. the functional analysis on ERA Entities & ERA relationships but "At no more ERA objects" at care of the current normalizer, Data design continues creating an SQL VIEW for each report, etc.

5.11 Standardizing R tabulation Standard tabulation must be a gallup extraction on big tables or a current copy of manageable ones. Standardizing the 1NF/3NF tabulations is an important task not only among the company normalizers but for creating a sector (or even national) culture of tabulations sharing among all the designers compromised with "zero defect data" [1991Han] design "from the beginning" [1971Cod].

5.12 Tabulation oriented DML «Functional dependencies are patterns in data that may be observed due to corresponding regularities among the real world objects which are to be modeled by the data design» J. M. Janas (2003) [2003Jan]. There are two dependency oriented Database Manipulation Language (abbr. DML) statements, i.e. two relational set operators specifically dedicated by E. F. Codd for the three task of normalizing relations [1979Cod]: • Project supports the setting up of Dependency graphs, i.e. the graph allowing empirical functional dependency observations [1983Mai]; •

Also Project factorizes the original and redundant 1NF rows into several new and irredundant sets of 3NF rows according the model of the new predicates;



Natural Join checks if a 3NF table set (of more than one table), e.g. R {RA, RB, RC …}, recovers the original set of 1NF rows when there are more than one as the proof of being each of new relations in 3NF.

5.9 Gallup tabulation (Step1) SOURCE: Look for a populated 1NF source table; (Step2) DOMAIN: Get current active domain of each source column; (Step3) GALLUP: Compute the minimum number of rows for having 90% probability of having same value set distribution than the full 1NF source rows [1993A&T]; "A simple algorithm for inferring the set of functional dependencies from a random extracted subset of an existing full tuple set. It is shown that the upper bound of the sample complexity, which is the number of example tuples required to obtain a set of functional dependencies whose error is at most ε with a probability of at least 1−δ, where n denotes the size of the full tuple set" [1993A&T].

5.13 3NF mathematical proof The semantical combination of (1) refactoring a 1NF R into several 3NF predicates after analysing the redundancy of the R set of rows with the FD; (2) populating the new 3NF tables with the corresponding irredundant rows; and (3) recovering the original set of redundant rows using as input the current population of 3NF rows IS THE PROOF THAT THE CURRENT R WAS IN 1NF AND ITS NORMALIZATION HAS BEEN MATHEMATICALY CORRECT.

(Step4) EXTRACT: A random extraction of the Gallup’s number of 1NF rows from the source table; E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 10 of 76

5.13.1 The 3NF human factor The opposite scenario is the emergence of new tuples in the recovered original table (known as "spurious"). Or the disappearance of some original tuples. In both cases, the fault always due to poor design of the set of rows of the R tabulation under inspection.

6.

DATABASE DEPENDENCY

DATABASE DEPENDENCY report includes two key deliverables: • The main paragraphs related to Functional dependency (abbr. FD) of the E. F. Codd's IBM Research Report RJ909 — "Further Normalization of the Data Base Relational Model" (1971) [1971Cod], which is the seminal paper opening the formal analysis of database redundancy; • Following Russel, FD is a binary relation [1910R&W] & [2009Mad] represented as xFDy or, commoly, x→y; early in 1971 [1971Cod], Codd discovered the FD as a mathematical Logic tool that was the key of his agile performance looking for the cause of the unexpected data redundancy in the Relational Model data bank prototypes [1976Ast]; The following sections develops the DATABASE DEPENDENCY report: • Project operator; • Active domain; • Dependency graph; • Functional dependency; • Armstrong axioms; • Functional nondependency; • Dependency Boolean algebra; • Boolean Algebra legend; • Functional codependency; • Functional independency; • Functional dependency on R subsets; • Functional dependency 1.2; • Elementary FD; • Trivial dependency; • Mathematical logic x→y taxonomy; • Properties of xRy; • Relative properties of x→y; • Relative properties of x↛y; • FD semantics; • Multivalued dependency 1.2; and • Functional dependency family 1.2 code card.

6.1 Project operator Project [1979Cod] relational operator generates a derived relation from a base relation, its syntax is symply writing the intended projection of R, e.g. the projection "R[P, L, …]". "A projection, e.g. R[P, L, …], is the flying relation obtained by dropping all columns of R except those specified by the list, e.g. [P, L, etc.], and then dropping redundant duplicate rows" [1979Cod]. Three projections P, L, PL, give us the information if L functionally depends on P (or not) then such FD is an observed FD. In 1971, projection P, L and PL were specified as ΠP(R), ΠL(R) and ΠP,L(R) [1971Cod]. The graphical representation continues being the same.

6.2 Active domain "The set of values represented at some instant" in each column of R is the "active domain" [1970Cod] of that column. Let present the SUPPLY tabulation.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

Project PartSupplier Quantity J1 P2S3 12 J1 P2S1 17 J5 P1S3 4 J5 P2S1 17 J5 P3S1 23 J5 P4S2 4 J7 P3S2 9 J7 P4S2 9 For instance, the projection SUPPLY[Project, Quantity] specifies the active domain of such two columns, i.e. the active domain of the combined values of Project and Quantity, at a given time. Active Project Quantity domain J1 12 J1 17 J5 4 J5 17 J5 23 J7 9 SUPPLY

6.2.1

Composite active domain

As stated, two or more attributes of R can form a composite active domain.

6.3 Dependency graph «A system constructed on the basis of observable things and their observable properties and relations» R. Carnap (1937) [1937Car]. «If ƒ:x→y is thought of as a set of ordered pairs {〈a, b〉 (a∊DOM(A) and b∊DOM(B)}, then at every point in time for a given value of a∊DOM(A) there will be at most one value of b∊DOM(B)» Berstein (1976) [1976Ber]. The dependency graph is an ordered triple of sets P = 〈XYGraph, XReferent, YRelatum〉: ― XYGraph is a finite set of ordered pairs 〈x, y〉; ― XReferent contains the finite range of left-side elements of XY; ― YRelatum contains the finite range of right-side elements of XY; We display each dependency graph in the following sequence: XReferent, XYGraph, YRelatum. ⇨ "The projection R[x,y] is at every instant of time a function from R[x] to R[y]" [1971Cod] and it is a Dependency graph; ⇨ Dependency graph is a relational graph, i.e. just the list of a functional dependence between the independent values of attribute x and the dependent values of attribute y IS MEANINGFUL; ⇨ Dependency graph was designed by E. F. Codd for observing (or not) the functional dependency ordered pattern between two attributes; ⇨ Besides, dependency graph allows simultaneous observation of "R[x] towards R[y]" and "R[y] backwards R[x]" functional dependencies.

6.3.1

Dedekind's transform

"The definition of a function as a correspondence between two arbitrary sets (not necessarily consisting in numbers) comes from Dedekind in 1887" [2013EoM]. «By a transformation [Abbildung] ƒ of a system S, we understand a law according to which to every determinate element s of S, there belongs a determinate thing which is called the TRANSFORM OF s and denoted by ƒ(s)» [1901Ded].

2016-01-2412:51:18 Page 11 of 76

6.3.2



The relational graph

We use “graph” term in the mathematical sense of "the graph of y=ƒ(x)". A functional dependence exists from x to y just with the table of the ordered pair of values 〈x, y〉, i.e. the graph revealing the TRANSFORM OF existence (Digest of [1834Lob] + [1972B&S]) but we do not need an approximate function but just what of the four possible FD patterns, i.e. x→y, x←y, x↛y or x↚y, matches the current FD graph!

6.4 Functional dependency Functional dependency is a binary relation [1910R&W] discovered by Codd in 1971 [1971Cod] in the context of formalizing the possible data design defects being the cause of unexpected data redundancy in the databases of the relational model prototypes, e.g. System R of IBM [1976Ast]. Being R the variable name of current entity, e.g. R (P, F, L, T, E, J); R also is the variable name of its set of attributes, i.e. R: {P, F, L, T, E, J} while R* represents the set of rows of R at a given instant. "Attribute F of relation R is functionally dependent on attribute P of R, if and only if, at every instant of time, each value in P has no more than one value in F associated with it under R*" [1971Cod]. In other words, The projection R[P, F] is at every instant of time a R[F]=ƒ(R[P]). The functional dependency holds although the content of the relational graph, i.e. R[P, F], can be, and usually will be, time varying. The main result is that we write one of the four types of dependencies of the FD family, for instance, the familiar FD that follows (in the FD specs). We write R.P→R.F if F is functionally dependent on P in R* [1971Cod].

6.4.1

True functional dependency

As usual, x→y Logic binary relation has its converse although unmentioned by Codd. x←y ≝ y→x In this context, x→y is read "x determines y" [1974Arm] and y←x is the true "y depends on x". [1910R&W]

6.5

R & FD specs in Codd notation

Codd notation is the standard language documenting R an its FD specs. Let taking advantage of a complete example of such notation for explaining its details. This explanation of FD specs starts with the own relation whose name is R₄ [1971Cod]. The dependency notation will refer to this relation. R₄(X#, S#∊SUPPLIER, P#∊PART, J#∊PROJECT, Q, SC) where: S# = supplier number; P# = part number; J# = project number; X# = order serial number; Q = quantity; SC = supplier city. The example includes all primitive FD's: • Being X# the primary key, it is the determinator node; • Being (S#, P#, J#) a composite candidate key, it is a collateral node; {X#↔(S#, P#, J#); … • The line between X# and Q is an intransitive line …X#→Q; … whose converse is an implicit nondependent line (Q↛X#); • The line between (S#, P#, J#) and Q is a determines line whose converse is an implicit nondependent line; this determines is a collateral-transitive line, i.e. it is not a primitive dependency line; {S#, P#, J#)→Q E. Villar THIRD NORMAL FORM FUNDAMENTALS

The line between S# and SC is an intransitive line whose converse is an implicit nondependent line; … S#∊SUPPLIER→SC}; • The trivial connection between (S#, P#, J#) and S#, i.e. (S#, P#, J#)→S#, in this case, it is a regular line; S#↛(S#, P#, J#) is the implicit converse line. The classical primitive FD specs of R4 are: R4: {X#↔(S#∊SUPPLIER, P#∊PART, J#∊PROJECT); X#→Q; (S#, P#, J#)→S#; S#→SC}.

6.5.1

Reminder of 1NF compliance

“S#→(S#, P#, J#)” always is contradictory with “(S#, P#, J#)→S#”, remembering that each candidate key is “a maximal functionally independent subset” of R and there is no room for codetermines socket inside a CK. This is part of 1NF automatic compliance of every RM table. The generalization of this reminder is part of FD 2.1 under x⊋y and x→y instantiation.

6.6 Caveat on our FD versionning All the statements of Codd [1971Cod] on Functional Dependency are very accurate on R attribute→attribute behavior but no so on attribute→subset, subset→attribute and subset→subset. Please, assume with us that all the previous seminal FD development of Codd & Heath is the Functional Dependency 1.1 that, in fact, it is unlabeled. Besides —as it can not be less— FD 1.1 contains a lot of seminal ideas, e.g. x↔y, that are part of FD 1.2, FD 2.1, or FD 2.2 according author's discourse convenience.

6.7 Functional nondependency Codd complemented Functional dependency with the definition of Functional nondependency. We write R.E↛R.J if J is not functionally dependent on E in R* [1971Cod] and we read "x nondetermines y".

6.8 Nondetermines extended definition Let us apply the logical NOT to the formal FD definition in order to visualize how the NOT(FD) logical device run. Attribute x of relation R functionally nondetermines attribute y of R if, at every some instant of time, each value in x has no more than one value in y associated with it under R*. A dependency graph every time is a functional transform but a NOT (functional transform) can fail having some particular functional cases but without impairing its essence of being NOT functionally dependent. Now it is time of explainning the how observing and interpreting a dependency graph.

6.9 Dependency graph interpretation

With an active domain of an ordered pair of attributes of a relation R Codd could observe the type of graph of Project vs. Quantiy. Three projections: SUPPLY[Project], SUPPLY[Project, Quantity] and SUPPLY[Quantity] give us the holistic information: "Quantity functionally depends on Project (or not)" AND "Project functionally depends on Quantity (or not)".

Figure 2. Projections forming a relative graph There are four possible observations [1983Mai]: • IF one-one THEN Project↔Quantity; 2016-01-2412:51:18 Page 12 of 76

• IF many-one THEN Project↣Quantity; • IF one-many THEN Project↢Quantity; • IF many-many THEN Project↮Quantity. 'one' means a functional behavior (implying "every time" is so); 'many' means a NOT (functional behavior) implying {("sometimes is NOT functional") but (sometimes is functional)}. In this case, the two observed FD's with a look are many-many, i.e. Project↛Quantity and Quantity↛Project. The holistic observation Project↮Quantity between two attributes was before three of the four compact FD connectors definition but pushed in the same direction and the full definition of NOT(FD) device is here for stay. Please, see below: Nondetermines (reserved).

A and B are generic subsets of 〈x, y〉 whose meaning starts and ends with this dense Logic discourse. The order of exposition follows the ordering of the sucint mathematical Logic

6.10 First FD formalizations

6.13.3 Relative INTERSECTION

The first formalizations of Functional dependency were the Properties of "Functional relations" [1973D&C] and the famous Four Axioms of William W. Armstrong [1974Arm].

6.13.4 Determined (converse)

6.11 Delobel & Casey FD properties Delobel & Casey FD properties [1973D&C] were the first formalization of the properties of "functional relations". The formalization includes the representation of composite attributes that originally were "E: {E1, E2, …}" [1973D&C]. DC DC1. DC2. DC3. DC4. DC5.

DC6.

Let: R be the non-empty set of attributes of R: {x, y, z, w, v, u} able of forming binary nodes being pairwise disjoint in order to mimic the behavior of two different attributes of R. Transitivity IF x→y & y→z THEN x→z; Reflexivity IF x∊Ω THEN x→x; Projectivity IF x⊃y THEN x→y; Additivity IF x→y & x→z THEN x→(y·z); IF x→y & (y·z)→w THEN Pseudotransitivity (x·z)→w; Augmentation IF x→y & (x·z) THEN (x·z)→y;

6.12 Armstrong FD axioms «The dependency structure axioms (ℱ1.) to (ℱ4.) are not intended to be an optimal choice, however, they do provide a standard for verifying the correctness and completeness of other axioms system» W. W. Armstrong (1974). ℱ

Let: R be the non-empty set of attributes of R: {x, y, z, w, v, u} able of forming binary nodes that mimic the pairwise atribute-attribute behavior when the elements of each wff as x→y are pairwise disjoints binary subsets. IF x∊R THEN x→x; ℱ1 IF x→y & y→z THEN x→z; ℱ2 IF (x·y) & y→(z·w) THEN (x·y)→w; ℱ3 IF x→y & z→w THEN (x, z)→(y, w); ℱ4 Definition 1: Armstrong FD axioms The set of four axioms is an abstraction of Delobel & Casey FD properties [1973D&C]: "It is easy to verify that. (DC1), (DC3), and (DC4) together are equivalent to our axioms for full families of dependencies" [1974Arm].

6.13 Dependency Boolean algebra «All that is mathematizable merits be mathematized» François Bresson (La Sorbone, Paris, 1968) [1969Lab]. The following eight definitions is the core of an FD Boolean algebra [2009Mad] following the wonderful FORMULAS AS RELATIONS track of Maddux [2000Mad]. R is the universe of individual attributes. R² is the set of all x→y although only those FD's coming from an observation of a relational graph between pairs of different attributes are relevant for the FD structure. In the FD structure of R each attribute is not repeated, simply each attribute occupies its place either being a node, e.g. (P)→(L), or being part of a subset, e.g. (P·L)→(T). E. Villar THIRD NORMAL FORM FUNDAMENTALS

6.13.1 Determines (original) everytime (x→y)

Attribute x of relation R functionally determines attribute y of R if, at every instant of time, each value in x has no more than one value in y associated with it under R*.

6.13.2 Relative UNION A∪B A∩B everytime (x←y)

{x→y} OR {z→w} {x→y} AND {z→w} {〈x, y〉: y→x}

6.13.5 Relative Identity Id

{〈x, x〉: x∈U}

6.13.6 Relative Diversity Di

{〈x, y〉: x, y∈U AND x ≠ y}

6.13.7 Nondetermines (reserved) sometimes (x↛y)

Attribute x of relation R functionally nondetermines attribute y of R if, at some instant of time, each value in x has more than one value in y associated with it under R*.

(x↛y) is the Boolean complement of (x→y): x↛y ≝ {〈x, y〉: ¬(x→y)} However, we have do an standalone definition of nondetermines binary relation. In this way, the future tasks of looking for a solid multiattribute definition will not enter in a Logical loop.

6.13.8 Nondetermined sometimes (x↚y)

{〈x, y〉: y↛x}

Normally, (x↚y) is the DeMorgan complement of (x→y) [2006Mad]: x↚y ≝ {〈x, y〉: ∼(x→y)} However, in order to underlinning the (x↛y) standalone definition, (x↚y) will appear as the converse of (x↛y).

6.14 ℛ Boolean Algebra legend Maddux has generalized the Boolean algebra concepts for binary relations, with the following mathematical Logic devices: There is U: {a, b, c, d, e, ...} being existing individuals under the scope of x, y, z variables. There is U²: {〈a, a〉, 〈a, b〉, 〈a, c〉, 〈a, d〉 〈a, e〉,...} where each 〈x, y〉 represents not only an existing pair but also an implicit xRy relative predicate. And, here is the 'click', the R of xRy formula is, not only the generic binary relation R, but it is also a predicate variable; and the predicate instances under R variable scope are elements of an explicit set called ℛ. All predicate instances of ℛ are of binary relation class, i.e. ℛ := {∅, Id, Di, A, B, C, D, ℛ²}; where 'Id' is the identity relation, i.e. x=y; 'Di' is the diversity relation, i.e. x≠y; A the original binary relation, e.g. >, B the converse of >, i.e. , i.e. ≥, D the converse of the complement of >, i.e. ≤.

2016-01-2412:51:18 Page 13 of 76

6.15 Multivalued dependency 1.2 "A multivalued dependency is one that is not a functional dependency" [1990Cod]. Multivalued dependency (abbr. MVD), i.e. x↠y, is a synonym of Nonfunctional dependency (abbr. ¬FD), i.e. x↛y. As stated above, "x↛y" reading is "x does not determines y" but "x↠y" reading is "x multidetermines y". Multivalued dependency [1977Fag] is the same binary relation of nondetermines [1971Cod] but with another name. From now on, "multivalued dependency" is an important synonym of "nondetermines". -Multidetermines ≡ Nondetermines x↠y

x↠y:{〈x, y〉: x↛y}

majority of time multidetermine s

6.18.1 IndependentOf definition If both R.E↛R.J and R.J↛R.E hold at sometimes R.E and R.J are in many-to-many correspondence, and we write R.E↮R.J.

6.19 Functional dependency on R subsets "The definition given above can be extended to collection of attributes" [1971Cod] allong the following steps: FD compare concatenated subset values; Providing subset were disjoint; Representation principle. Maximally independent subsets of R; Every attribute of y depends on full x; Full subset y depends on full subset x; Every attribute of y depends on full x; and None attribute of x depends on the full y.

6.19.1 FD compares concatenated subset values

Definition 2: Multivalued dependency Formally, "there is a multivalued dependence from A to B (A↠B) if, at any point in time, a fact about A determines a set of facts about B" [1985Smi].

"Thus, if x, y are distinct collections of attributes of R, y is functionally dependent on x if, at every instant of time, each yvalue has no more than one x-value associated with it under R. The notation x→y, x↛y introduced for individual attributes is applied similarly to collections of attributes" [1971Cod].

6.15.1 FD & MVD integration

6.19.2 Providing subset were disjoint

The integration of functional dependency and multivalued dependency was an objective largely had in mind but never closed to the liking of the key players.

6.16 Functional dependency family 1.2 code card The full family of Functional dependency 1.2 performs on any ordered pair 〈x, y〉 of two attributes of R or of two proper subsets of the power set of R providing they were disjoint (in order to mimic the attribute-attribute behavior). Determines Determined Nondetermines Nondetermined Multidetermines Multidetermined

x→y x←y x↛y x↚y x↠y x↞y

∀{〈x, y〉: 〈x, y〉∊ℙ(R), x≠y, x⊍y {〈x, y〉: ∀y′∊y (x→y′) AND ∄x (x→y′); {〈x, y〉: y←x}; {〈x, y〉: ∃y′∊y: ¬(x→y′)}; {〈x, y〉: y↛x}; {〈x, y〉: x↛y}; {〈x, y〉: y↚x};

In the case of Functional dependency, the original binary relation is →, the converse is ← binary relation, the complement is ↛ and the De Morgan complement is ↛. Consequently, the FD Boolean algebra signature is ℛ := {∅, Id, →, ←, ↛, ↚, ℛ²} whose clear development —following Maddux [2009Mad]—has been performed in Dependency Boolean algebra. It is worth noting that ↠, i.e. multidetermines binary relation, is an alias of ↛, i.e nondetermines binary relation, and ↞, i.e. multidetermined binary relation, is the alias of ↚, i.e. nondetermined binary relation.

6.17 Functional codependency «If both x→y and y→x hold then we write x↔y» Codd (1971). If both R.P→R.F and R.F→R.P hold then at all times R.P and R.F are in one-to-one correspondence, and we write R.P↔R.F [1971Cod]. Let defining x↔y as a relative product, i.e.: x↔y ≝

6.18 Functional independency Codd manages the concept of "Functional independency" in "The collection of attributes of R in a candidate key of R is a maximal functionally independent subset" [1971Cod] although he never defined Functional independency formally. However, functional independency definition is just a combination of two nondetermines binary relation (as follows in the next paraphrases of Codd). E. Villar THIRD NORMAL FORM FUNDAMENTALS

In the case of two overlapping R subsets, the attribute-attribute behavior does not apply because two attributes never overlap each other thus both R subsets must be not only distinct but disjoint, in order to mimic the attribute-attribute FD behavior, giving to the following Codd's paraphrasis: "Thus, if x, y are disjoint collections of attributes of R, y is functionally dependent on x if, at every instant of time, each yvalue has no more than one x-value associated with it under R. The notation x→y, x↛y introduced for individual attributes is applied similarly to collections of attributes" [1971Cod].

6.19.3 Representation principle In case of two or more joint collections of attributes, the designer discard all except one that will be the determinator of the resulting 3NF; taking into account that —at second level of the FD structure— such future determinator is a simple node acting as determinant. This rule is similar for selecting the PK but here it applies to two or more collections of attributes that "it is the case" that they clash with the mandatory refactoring step. Please, see Alternate key vanishment of DESIGNER RELATION report. There is the nuclear reasoning that supports the Representation principle that follows. Just selecting the primary key is the first step in the normalization business. A corollary flows. Please, do not geopardize the "arbitrary" [1971Cod] primary key selection!

6.19.4 Subsets of R Being R the variable name of current entity, e.g. R (P, F, L, T, E, J); R also is the variable name of its set of attributes, i.e. R: {P, F, L, T, E, J}. R* represents the set of rows of R at a given instant. Ω is the powerset of R; x, y, z and w are disjoints subsets of Ω. Thus, if x, y are subsets of R, y is functionally dependent on x if, at every instant of time, each x-value has no more than one y-value associated with it under R*.

6.20 Every attribute of y depends on full x "Suppose x and y are two disjoints subcollections of the attributes of relation R, y″ is an attribute of y, and x→y″ holds [1971Cod] for every y″ then:

2016-01-2412:51:18 Page 14 of 76

{IFF x→y″ for every y″ attribute in y THEN x→y} whose formalization is: (x→y)≝∀y″∊y (x→y″) (1) Definition (1) always was implicit but it converse, i.e. {IF x→y THEN x→y″ for every y″ attribute in y} (2), was documented by Zaniolo [1982Zan] in 1982. The formal formula of (2) follows. IF ∀y′∊y (x→y′) THEN x→y (2)

6.26 Trivial dependency "A functional dependency of the form R.D→R.E where E is a subset of D will be called trivial dependency" [1971Cod]. The trivial dependencies are those dependencies between an irredundant subset of R and its own proper subsets (that continue being irredundant). x⊋y, i.e. "x Superset-of-but-Not-equal-to y" is a non-symmetrical, transitive and irreflexive binary relation.

6.21 Full subset y depends on full subset x "Intuitively, y subset is functionally dependent on the whole of x but not on any subset of it" [1971Cod]. Formally, it is more stronger more clearer if none subset x′ of x determines any attribute y″ of y. "In other words, none attribute of y functionally depends "on any subset of x (other than x itself)" [1971Cod]. (3) (x→y)≝∀y″∊y (x→y″) AND ∄x′⊊x: (x′→y″)

6.22 None attribute of x depends on the full y We also assume that none attribute of x depends on the full y subset. This assumption implies —with more reason— that none attribute of x depends on a part of y subset.

6.23 Brochure of FD versions 6.23.1 Functional Dependency 1.2 Functional Dependency 1.2 is an strategic Boolean algebra [2009Mad] that formalizes FD as the binary relation [1910R&W] that always was. Besides Multidetermines [1977Fag] binary relation (abbr. MVD) is integrated in the mentioned algebra. And each member of the FD family is classified following the standard of Russel [1903Rus] according the symmetry, transitivity and reflexivity of every xRy type of FD. FD 1.2 allows the early third normal form classification of Heath. 6.23.2

Functional dependency 2.1

Functional dependency 2.1 formulas have FD sockets wich are compact connectors —following Codd's x↔y fashion— and the original FD is the IndependentOf socket, i.e. x↮y, in order to be the seed of solid composite attributes. All in the MAXIMAL INDEPENDENT SUBSET report.

6.23.3 Functional dependency 2.2 Functional dependency 2.2 starts when FD 2.1 sockets apply upon the solid composite attributes aka Maximal Independent Subsets (abbr. Miß, Mißes; miß, mißes). A miß continues being disjoint of any other miß and besides it is a combination of happy Codd unused ideas and nomenclature together an insightful assembly of R subsets of any size in Quine's fashion [1981Qui]. The protagonist of PANEL tabulation dependency analisys is the FD 2.2 family.

6.24 Functional dependency 1.2

Figure 3. Trivial FD structure of x, y, z attributes Any irredundant subset of R can form trivial dependencies, e.g. R: {x, y, z}, and every of the proper subsets of the powerset of {x, y, z}, too. The lattice [1974Arm] of Figure 2 is equiped with this posibility.

6.26.1 Trivial FD + nontrivial FD = nontrivial FD However, a mix of trivial and no-trivial FD's is not trivial at all, for instance, the (ℱ3.) Armstrong's axiom: IF x⊃y & y→z & z⊃w THEN x→w [1974Arm]. The dependency pattern of the second normal form solving a partial dependence design defect, includes a trivial dependence, that 3NF- oriented (ℱ3 .) axiom captures in the following way: IF (x·y)→z & y→z THEN (x·y)→y & y→z;

6.27 x→y is a binary relation Functional dependency (abbreviated FD) is the binary relation xFDy, i.e. a simple two places predicate instantiating xRy ordered formula [1910R&W], [1965H&L] and [2006Mad]. xRy is mathematical Logic formula equivalent to Pxy (the Logic predicate formula) but specifically for subject-to-subject relationships. Being here, x→y is an instance of the xRy "pictoric formula" [1921Wit].

6.27.1 Specialties of FD Some specialties of FD follows. x→y is read "y depends on x". x is the referent of Russell in [1910R&W], it was called "determinant" by Heath [1971Hea]. y is the relatum [1910R&W] or the "dependent" — according Codd [1971Cod]. x and y attributes are elements of same set R and 〈x, y〉 ordered pairs are elements of R².

Let x, y be disjoint subsets of R; let x′ be a subset of x and y′ be a subset of y; let x″ be an attribute of x subset and let y″ be an attribute of y subset; now the definite formula of Codd's Functional dependency follows. Every attribute of y subset of R (x→y)≝∀y′∊y (x→y′) depends on full x subset of R. None attribute of y depends on AND ∄x′⊊x: (x′→y″) some subset of x. None attribute of x depends on AND ∀x″∊x: (y↛x″) the full y.

6.27.2 Observable FD is TRUE

After defining the Functional dependency 1.2, x→y can be an Elementary FD [1990E&N], and a trivial dependency can be easily defined.

6.27.3 x↛y possibility

6.25 Elementary FD (x→y) is an Elementary FD [1990E&N] (abbr. EFD) when the dependent y is a single attribute. E. Villar THIRD NORMAL FORM FUNDAMENTALS

Pxy predicate can be TRUE or FALSE [1965H&L], however xRy predicate on two definite sets or in a shared definite set (like R is) has the following properties: (1) x and y are existent subjects of a set; (2) the predicate xRy is TRUE [1910R&W]. In other words, a binary relation ―as (x→y)― is the Logic counterpart of a real relationship, and cannot enter in the Logic discourse with a value of FALSE. The Logic informant also has other means for dealing with its labour, for instance, documenting with the complement of (x→y) —which is (x↛y)— that y attribute does not depends on x attribute. 2016-01-2412:51:18 Page 15 of 76

6.27.4 Codd's FD specs of R

6.30.1 Relative properties of x↚y

Codd's FD specs has the form of a set, and every ordered pair is unique impeding at the same time duplication and contradictions, for instance:

As expected, Determined By binary relation is non-symmetrical, non-transitive and irreflexive. We left to the reader its specific development.

∃!〈P, L〉: {(P→L) NOR (P↛L) NOR (P←L) NOR (P↚L)}; That is, only one of the four possibles dependencies between an ordered pair of attributes can be documented.

6.28 Properties of xRy The standard classification of binary relations as a Logic predicate focuses three independent and relevant areas: symmetry, transitivity and reflexivity.

6.28.1 xRy positive behavior "A relation is called symmetrical if, whenever it holds between x and y, it holds also between y and x. A relation is called transitive if, whenever it holds between x and y, and between y and z, it holds also between x and z. A relation is called reflexive when it holds between a term and itself" [1910R&W]. In PM generic terms [1965H&L]: o R is symmetrical: IF xRy THEN yRx; o R is transitive: IF xRy & yRz THEN xRz; o R is reflexive: IF x∊U THEN xRx.

6.28.2 xRy negative behavior "When the converse is incompatible with the original relation, as in such cases as greater and less, I call the relation asymmetrical" [1908Rus]. In PM generic terms [1965H&L]: • R is asymmetrical: IF xRy THEN ¬(yRx); • R is intransitive: IF xRy & yRz THEN ¬(xRz); • R is irreflexive: IF x∊U THEN ¬(xRx).

6.28.3 xRy unpredictable behavior "In intermediate cases, I call the relation not-symmetrical" [1908Rus]. Unpredictable behavior comes when the relation xRy has an empirical component, e.g. (x loves y) but such y can correspond to this x loving it (or not); and so on, for every ordered pair of U×U matrix. In PM generic terms [1965H&L]: R is non-symmetrical: IF xRy THEN {yRx NOR ¬(yRx)}; R is non-transitive: IF xRy & yRz THEN xRz NOR ¬(xRz); R is non-reflexive: IF x∊U THEN xRx NOR ¬(xRx).

6.29 Relative properties of x→y Determines (binary relation) is non-symmetrical, transitive and reflexive. Determines instances of such properties follow: Non-symmetrical: IF x→y THEN {y→x NOR y↛x}; Transitive: IF x→y & y→z THEN x→z; Reflexive: IF x∊U THEN x→x. Non-symmetrical property of X→Y means "If X→Y in R, this does not say whether or not Y→X in R" [2007E&N].

6.29.1 Relative properties of x←y As expected, Depends On binary relation also is non-symmetrical, transitive and reflexive. Depends On (binary relation) is: Non-symmetrical: IF x←y THEN {y←x NOR y↚x}; Transitive: IF x←y & y←z THEN x←z; Reflexive: IF x∊U THEN x←x.

6.31 FD semantics When x→y (or y←x) holds in R, it has the following semantic facets: • "y functionaly depends on x" [1971Cod]; • "x determines y" [1974Arm]; • "y is a fact about x" [1983Ken]; • "y describes x" [1965H&L]; • "y sorts x" [1965H&L]; and definitely IF R: {P, F, L, T, E, J} is in 1NF} AND {T and J are attributes of the same table R} AND {T→J} THEN {T is the primary key of RT} AND {J is a descriptive attribute of RT.T} AND {RT (T, J)} AND {RT already is a 3NF table in the mind of the minor god of data design} END-IF

7. FUNCTIONAL DEPENDENCY 2.1 The first application of Functional dependency theory ought to be the formal definition of composite attribute of R and the standard way of safe compositions for definite descriptions. Remembering that both, the own subject identifier and the predicating part of an RM/T polyadic predicate, can be a definite description. For a database denotation, please, find, Characteristic primary key. MAXIMAL INDEPENDENT SUBSET report has the following sections: • Evolution of dependency connectors; • FD 1.1 instantiation; • Functional dependency 2.1; • FD 2.2 boolean inference rules; • Δ Inference laws (grid report); • Δ Socket composition overview; • Δ Transitive laws; • Δ Euclidean circle; • Δ Symmetrical laws; • Δ Pro-transitive laws; • Δ Empirical laws; and • Δ Empirical disambiguation.

7.1 Evolution of dependency connectors In the next picture, taking advantage of Codd's "Transitive Dependence of C on A under R" diagram [1971Cod], we have isolated his own dependency connectors, completed other absent connectors and compossing the three absent sockets, also following the shape of arrows already used by Codd for its codetermines socket.

6.30 Relative properties of x↛y Nondetermines binary relation is non-symmetrical, non-transitive and irreflexive: Non-symmetrical: IF x↛y THEN {y↛x NOR y→x}; Non-transitive: IF x↛y & y↛z THEN {x↛z NOR x→z}; Irreflexive: IF x∊U THEN NOT(x↛x).

Evolution of Codd's dependency connectors

7.2 FD 1.1 instantiations E. Villar THIRD NORMAL FORM FUNDAMENTALS

Let define the two possible instantiation of x→y in reality: 2016-01-2412:51:18 Page 16 of 76

x→y ≝ {〈x, y〉: xy∊R, {x↣y} NOR {x↔y}

7.4.1

The formula remembers that x→y is just part of the reality, i.e. the empirical converse of every x→y is either {y→x} or {y↛x}; and that only one of the two connection will exist. x→y can instantiate as x↣y or as x↔y (but not both).



7.2.1

x⊋y instantiation

The instantiation of a trivial dependency is not trivial. IF x⊋y THEN x↣y The formula remembers that being x a miß, e.g. (A·B·C), the instance "(A·B·C)→A" of "(A·B·C)⊋A" can be "(A·B·C)↣A" but not "(A·B·C)↔A". x⊋y only can instantiate as x↣y. In the other hand, structural instantiation of x⊋y always is "pulled by" an empirical FD chained at its right, e.g. y→z, yielding: IF {(x·y); y→z} THEN (x·y)→(y)→(z).

7.2.2

7.3 Functional dependency 2.1 The Functional dependency 2.1 formulas have an FD compact format. Δ is the mark of FD 2.1 which in its turn is the acronym of "Functional Dependency 2.1". We start FD 2.1 by IndependentOf connector for being the seed of solid composite attributes. FD 2.2 starts when FD 2.1 sockets apply upon Maximal Independent Subsets (abbr. Miß, Mißes; miß, mißes) after its formal definition and after being the reader able of compossing a miß of any size. Functional dependency 2.1 family includes: o Functional independency 2.1; o Functional dependency 2.1; o Functional converse dependency 2.1; and o Functional codependency (original advance of 2.1).

Functional independency 2.1

Functional independency or IndependentOf socket, i.e. (x)↮(y) ≝ {〈x, y〉: x↛y and y↛x};

7.3.2

Functional dependency 2.1

Standard functional dependency or DeterminesOf socket, i.e. (x)↣(y) ≝ {〈x, y〉: x→y and y↚x};

7.3.3

Functional converse dependency 2.1

Left-to-right functional dependency or DependsOn socket, i.e. (x)↢(y) ≝ {〈x, y〉: x←y and y↛x};

7.3.4

Functional codependency (original)

Functional codependency or CodepensOn socket comes from Codd [1971Cod], just using the Maddux fashion of all previous formal definitions of FD family, i.e. (x)↔(y) ≝ {〈x, y〉: x→y and y→x};

7.3.5

7.5 Δ Inference laws (grid report) We present the holistic product ɷ in the form of a grid report. Each premise is part of a cell of an axe. Each inference occupies its crossing cell. As expected, there are 16 full-duplex dependency axioms.

ɷ x↮y

y↮z

y↣z

y↢z

y↔z

xɷz

x↮z NOR x↣z

x↮z NOR x↢z

x↮z

x↣z

xɷz

x↣z

xɷz

x↢z

x↢z

x↣z

x↢z

x↔z

Formal x=x instantiation

The x=x instantiation formula is "IF x=x THEN x↔x". Hoewvwer, this instantion is a mere formality: Never x→x were an instance of the FD specs because a directed acyclic graph (digraph) is "a directed graph with no directed paths from any node to itself" [1983Mai].

7.3.1



Initial definitions xɷz ≝ {〈x, z〉: x↮z NOR x•→z NOR x↢z NOR x↔z}; xɷz means that any of FD 2.1 sockets is observable; x—z ≝ {〈x, z〉: x↣z NOR x↢z NOR x↔z}; xɷz means that any of determinesOf, dependsOn or Codetermines socket is observable; a less descriptive formula would be: x—z ≝ {〈x, z〉: NOT(x↮z)}.

Functional dependency 2.1 code card IndependentOf (x)↮(y) DeterminantOf (x)↣(y) CodeterminantOf (x)↔(y) DependentOn (x)↢(y)

{〈x, y〉: x↛y and y↛x} {〈x, y〉: x→y and y↛x} {〈x, y〉: x→y and y→x} {〈x, y〉: x←y and y↚x}

7.4 FD 2.2 boolean inference rules There are sixteen FD 2.2 inference rules which is the result of composing two-by-two the four holistic relative products: ↮, ↣, ↢, and ↔.

x↣y x↢y x↔y

x↮z NOR x↣z x↮z NOR x↢z

x↮z

Definition 3: FD inference laws (grid report) Eleven laws can be used for completing the dependency structure. The five composition whose consequent is xɷz (or includes it), are the patterns that only can be filled in empirically, i.e observing the corresponding projections of R tabulation, Otherwise, you only will have beautiful tautologies.

7.6 Δ Socket composition overview

• Three empirical laws (H1-H3); • Three transitivity laws (T1-T3); • Four Laws of Symmetry (S1-S4); and • Six pro-Transitive Laws (P1-P6). The normalizer or the data inspector strategy normally would be discovering the primitive dependency structure as soon as possible, using the two-ways transitive laws. The primitive FD structure corresponds to the tabulation of R (it cannot be other wise) but such FD structure —which is unique— refers to the R predicate which is generic (even without rows). The presentation of laws starts with the transitive laws which are familiar and finish with the rules that are detecting the patterns without clear Logic consequent, the empirical holes that must be filled in with a dependency graph observation. The table inspector seems going the other way around: Observing first, filling in then every cell of an empty dependency matrix, developing the corresponding dependency specs with stepwise refinement until having a dependency diagram. We call “deep structure”, “condensed structure”, “compact structure” and “dependency tree” to the primitive dependency structure.

7.7 Δ Transitive laws T1: IF x▪→y & y▪→z THEN x▪→z; T2: IF x←▪y & y←▪z THEN x←▪z; T3: IF x y & y z THEN x z;

The Armstrong's ℱ2 Axiom (IF x→y & y→z THEN x→z), and its converse ℱ2-1, are completed in the sense of considering the holistic dependency relationship between (x,y) and (y,z) pairs giving holistic transitive patterns.

7.8 Δ Euclidean circle T3 law is an strong Eclidean circle. Euclidean relations are a class of binary relations that formalizes Euclid's "Common Notion 1" in The Elements:

E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 17 of 76

«Things which equal the same thing also equal one another.»: ∀abc∈X {IF aRb AND aRc THEN bRc}. (Source: Wikipedia).

Known any two of the three relationships, the third one is implied: T3a: IF y z & z x THEN y x; T3b: IF z x & x y THEN z y; T3c: IF x y & z y THEN z x;

T1 and T2 also have enhanced formulations, for instance: Euclidean Condensation formulas. (STUB)

7.9 Δ Symmetrical laws S1: IF x←▪y & y z THEN x←▪z; S2: IF x y & y▪→z THEN x▪→z; S3: IF x▪→y & y z THEN x▪→z; S4: IF x y & y←▪z THEN x←▪z; S2 and S3 are a kind of FD disambiguation theorems of the Armstrong's ℱ2 Transitive Axiom. S1 and S4 do the same job on Armstrong FD-1 transitive axiom. Δ Symmetrical Laws continues the previous steps of Δ Transitive Laws but now the result is not exactly a transitive pattern: Codetermines socket is part of one of the premises but not of the other. These laws are clearly holistic and they have some symmetrical flavour between determines or depends premise and the corresponding consequent.

7.10 Δ Pro-transitive laws Pro-transitive laws do exist because codetermines sockets is transitive. Known the left patterns, the consequent is at right, otherwise, transitivity gets broken. P1: IF x y; y z THEN x z; P2: IF x y; y z THEN x z; P3: IF x•→y; y z THEN {x z NOR x•→z}; P4: IF x y & y←•z THEN {x z NOR x←•z}; P5: IF x y & y•→z THEN {x z NOR x•→z}; P6: IF x←•y; y z THEN {x z NOR x←•z}; P1 and P2 compositions: (x y; y z; x z) and (x y; y z; x z), have the same DA02 singular pattern: IF x y & (z·x) & (z·y) THEN (z·x)•→x & x y. P3 has two compositions: (x•→y; y z; x•→z) and (x•→y; y z; x z). The first yields: x•→(y·z), the second has a DA01 singular pattern: IF x•→y & (z·x) & (z·y) THEN (z·x)•→x & x•→y. P4 final patterns are according P3: z•→(x·y) and (x·z)•→z & z•→y. The first composition of P5: (x y; y•→z; x z) has a DA01 singular pattern: IF y•→z & (x·y) & (x·z) THEN (x·y)•→y & y•→z. The second composition of P5: (x y; y•→z; x•→z) implies: (y⊋z; x⊋z). P5 composition matches Triad #7 which is a fragment of two ℱ3 axioms: IF y⊋z & (z→w & y⊉w) THEN y→z and IF x⊋z & (z→w & x⊉w) THEN x→z. Remark: Both compositions of P6 (x←•y; y z; x z) and (x←•y; y z; x←•z) are the converse compositions of P5, yielding parallel conclusions of previous paragraph.

7.11 Δ Empirical laws As stated, empirical laws are the Logic holes that clue the dependency structure to the micro-world of reference, without a minimal set of primitive observations, the previously mentioned laws cannot run. E1: IF (x▪→y and y←▪z) THEN x z NOR x•→z NOR x←•z NOR x z; E2: IF (x←▪y and y▪→z) THEN x z NOR x•→z NOR x←•z NOR x z; E3: IF (x y and y z) THEN x z NOR x•→z NOR x←•z NOR x z;

E. Villar THIRD NORMAL FORM FUNDAMENTALS

7.12 Empirical disambiguation The empirical holes that only can be filled in with OBSERVE method are the following. H1: IF (x▪→y and y←▪z) THEN OBSERVE(R*;x::z); H2: IF (x←▪y and y▪→z) THEN OBSERVE(R*;x::z); H3: IF (x y and y z) THEN OBSERVE(R*;x::z); OBSERVE method incorporates the thirteen programmable laws allowing the Logic filling in of the cells whose premises are already known. This method saves observations when dealing with atomic attributes and restricts both the number of candidate attributes for a new composite attribute and its potential dependent attributes. OBSERVE method, given a new pair 〈x, y〉 of Ω2, takes advantage of current state of dependency matrix. It is opportunistic but it is unable of having an observing strategy as the inspector’s mind has.

8. MAXIMAL INDEPENDENT SUBSET The first application of Functional dependency theory ought to be the formal definition of composite attribute of R and the standard way of safe compositions for definite descriptions. Remembering that both, the own subject identifier and the predicating part of an RM/T polyadic predicate, can be a definite description. For a database denotation, please, find, Characteristic primary key. MAXIMAL INDEPENDENT SUBSET report has the following sections: • Multiattribute; • Multiattribute notation; • Multiattribute miscellanea; • Redundant multiattribute; • Multiattribute context; • From binary to polyadic relations; • Composing irredundant subsets; • Construing a polyadic miß; • Assemble a triadic miß; • Dependency structure; • Condensed dependency structure; • Functional dependency specs; and • FD 2.2 structure of PANEL.

8.1 Multiattribute The neologisms 'multiattribute' and 'multicolumn' of a relation come from Rissanem [1977Ris]. Codd uses 'multiattribute' [1979Cod], 'attribute collection' and 'subset' [of R], for instance: "The notation introduced for individual attributes is applied similarly to collections of attributes" [1971Cod] or to subsets of R.

8.1.1

Irredundant multiattribute

A multiattribute M is irredundant if its internal attribute components shown to be independent among them "in much the same way as factors in a Cartesian product or orthogonal components of a vector" [1977Ris]. Internal independence seems giving "an easy-to-understand criterion for irredundancy" [1977Ris].

8.1.2

Maximal independent subset (miβ)

«A maximal functionally independent subset of attributes of R is not necessarily a candidate key» [1971Cod]. Following the idea, we will try to define a generic irredundant multiattribute available for the candidate keys -as usual- but not only, i.e. an irredundant format applicable to the non-key subset of a third normal form.

2016-01-2412:51:18 Page 18 of 76

A maximal independent subset (abbr. miβ) is a "maximal functionally independent subset" [1971Cod] of R, i.e. it is a collection of attributes in which "every proper subset of attributes is functionally independent of every other proper subset of attributes" [1971Cod] providing they were disjoint (in order to mimic the attribute-attribute behavior). Maximal independence concept is a "fine grain" formulation of the irredundant multiattribute concept.

8.1.3

Irredundant multiattribute of R

'Irredundant' and 'nonredundant' are already synonyms in English. Let 'subset [of R]', 'multiattribute' and 'multicolumn' be synonyms in our context. Therefore, a part of "miβ" abbreviation, any combination of a word of both list means "irredundant multiattribute", for instance, "irredundant subset of R".

8.2 Multiattribute notation An instance of the irredundant multiattribute goes between parentheses and with an explicit list of attributes separated by a middle-dot, for example, (A·B·C) An ordered irredundant subset (for example, a primary key of a characteristic entity [1979Cod]) goes with pointing parentheses, for example, (〈A∊RA·B〉 as AB, C, ... )

8.2.1

Unchecked multiattribute

The current Codd notation, i.e. (A, B, C), continues being useful for unchecked multiattributes.

8.3 Multiattribute miscellanea This section groups some complementary ideas on irredundant multiattributes of R.

8.3.1

Multiattribute corollary

A proper subset of an irredundant multiattribute A always is an irredundant subset of A for being A recursively independent.

8.3.2

Multiattribute default

By default, any proper subset of R is irredundant, e.g. x⊊R is an irredundant proper subset.

8.3.3

Black-box variable

Each multiattribute "functions like a box, which is closed to the outside. You cannot see what is in them. In particular, the names of the variables used in the formulae are hidden away" [1999Kra]. For the origin of black-box topic, please, see Yourdon & Constantine [1979Y&C].

8.3.4

Irredundant attribute

'Irredundant attribute' is a pleonasm. All attributes of R are irredundant by definition, i.e. for being indecomposable and for having an atomic value.

8.4 Redundant multiattribute A redundant multiattribute is a nonkey subset of R being not internally independent. For instance, if the designer increases a given irredundant nonkey subset with a new attribute being functionally dependent of another nonkey attribute. However, the resulting subset B of reducing a redundant multiattribute A can be irredundant or not.

8.5 Multiattribute context Let R be an RM/T associative entity. Let x, y, z, w be individual attributes, for instance, the components of the primary key of R. Composing an RM/T polyadic associative entity [1979Cod] deals with individual attributes of R, we cannot use variables being subsets of R. The business is starting with some single attributes and ending with a maximal independent subset of R that —following the example— will be the primary key of R. E. Villar THIRD NORMAL FORM FUNDAMENTALS

8.5.1

Familiar multiattributes

Obviously, multiattribute context applies to any current subset of R, be it a composite primary key, a composite candidate key or a composite descriptive subset.

8.5.2

Formal multiattributes

In the more general context of R normalization, the multiatribute context will apply to any composite node of any FD composition yielding more complex database structures of R meriting a normal refactoring in the form of an R schema of 3NF database relations.

8.6 From binary to polyadic relations Binary "relations can be introduced simply as classes of ordered couples, if we can contrive to define ordered couples. Clearly, any definition will serve this purpose if it makes for the distinctness of couples (x; y) and (z; w) in all cases except where x is z and y is w" [1953Qui]. "Let us construe in Kuratowski's [1920Kur] fashion, the ordered pair 〈x, y〉 as {{x}, {x, y}}. The above treatment of dyadic relations is immediately extensible to relations of any higher degree. For, a triadic relation of x, y, and z can be treated as a dyadic relation of x to the couple 〈y, z〉" [1954Qui], i.e. 〈x, 〈y, z〉〉. Tetradic relations could be handled on the basis of triadic ones in similar fashion" [1981Qui], i.e. 〈x, 〈y, z, w〉〉. "Similarly for pentadic relations, hexadic ones, and so on" [1981Qui].

8.7 Composing irredundant subsets An Irredundant subset of R will take the form of a Maximal independent subset of R whose abbreviation is Miß. Composing an irredundant subset includes two aspects: (1) The Quine's procedural definitions of an ordered polyadic relation starting from two complementary ordered binary relations instantiated here by "x↛y" & "y↛x"; and (2) How specify any type of compact FD specs, including a step-bystep assemble procedure of the intermediate mißes. Each single variable, e.g. x, is an attribute every time. Each miß has the standard format, i.e. (x·y·z· …). along all the steps.

8.8 Construing a polyadic miß The ordering and the degree of the target polyadic miß influence the content of each step. Each step adds a new attribute at left hand of previous miß. Composing a maximally independent subset of R, according Quine's fashion[1981Qui] has a seed and three steps. /seed/ let x↮y ≝ {〈x,y〉: x↛y & y↛x}; /step 1/ (z·w) ≝ {z↮w}; /step 2/ (y·z·w) ≝ y↮(z·w); and /step 3/ (x·y·z·w) ≝ x↮(y·z·w); /and so on/. Procedure 1: Construing a miß (Quine)

8.9 Assemble a triadic miß In the FD independent world, assembling "(x·y)↣z" is defined as: IF {(x·y); x↮z; y↮z; (x·y)↣z} THEN (x·y)↣z. But assembling "(x·y·z)" has two steps: IF {(x·y); x↮z; y↮z; (x·y)↮z} THEN (x·y)↮z; (1) IF {(x·y)↮z; (x·z)↮y;(y·z)↮x} THEN (x·y·z); (2) Finally, (x·y·z) definition with compact FD is as follows: (x·y·z) ≝ {(x·y)↮z; (x·z)↮y; (y·z)↮x}; This last definition develops the full independent structure of the (x·y·z) miß, displaying the underlined set of compact independencies which are hidden (and saved) in the already known and appealing miß defined as Quine fashion.

8.10 Dependency structure After analyzing R tabulation, we have a dependency structure. A dependency structure is a set of dependency sockets "involving all attributes" [1971Hea] of inspected R table.

2016-01-2412:51:18 Page 19 of 76

8.10.1 Dependency nodes Each different point is a singleton attribute or a non-empty subset of R attributes behaving as a single attribute ―in the form of a "maximal functionally independent subset" [1971Cod].

8.10.2 Dependency connectors The dependency connectors are double way lines of unified family of Functional Dependencies [1971Cod] & [1977Fag] running under the label "Functional dependency 2.1".

8.10.3 Dependency brick A dependency brick is the composition of two adjacent nodes with a dependency socket, for instance, (Flight·Pilot)↣Date.

8.10.4 Condensed structure The condensed or 'deep' structure will include all points and all lines linking adjacent nodes, i.e. redundant lines has been suppressed by transitive reduction [2015WTR].

8.10.5 Tree from One Source digraph Such a structure always is a "Tree from One Source" [1965Har], i.e. a Directed Acyclic Graph (digraph) [1965Har] & [1983Mai] of mentioned type. In such type of digraphs each [dependent] point has only one inline from one [determinant] point, and each [determinant] point has 0:n out-lines towards 0:n [dependent] points [1965Har].

8.10.6 FD is a Codd RM research The Database Dependency Theory and the examples of such dependency points and lines can be found in Codd the first time [1971Cod].

8.11 Condensed dependency structure Each node of a condensed dependency structure is an attribute or a "maximally independent subset of R” (abbr. miβ), and formally, “a maximal functionally independent subset” [1971Cod] of R. Each line of a condensed dependency structure is a double way connector of the four of Dependency 2.1 code card of previous section. Although only the following two are visible lines: (x)↣(y) and (x)↔(y).

8.13 Dependency structure of PANEL The primitive dependency specs when there are several candidates for being determinator or main determinant suffers an important enrichment at the care of the own owner of the initial relation R. In this case, the normalizer or review inspector is looking for an structure syntax easily interpretable (see: Representation principle [1982Zan]) coming from the primitive specs. In the case of PANEL tabulation, we have found the following steps: Primitive dependency specs; Determinator/PK selection; Determinant/PK selection; Representable dependency structure.

8.13.1 Primitive dependency specs {Flight↔(Destination·Time); PANEL: (Pilot·Date)↣Flight; (Date·Flight)↣Pilot}.

(Flight·Pilot)↣Date;

8.13.2 Determinator/PK selection PANEL subset: {(Flight·Pilot)↣Date; (Pilot·Date)↣Flight; is equivalent to (Date·Flight)↣Pilot} {(Pilot·Date)↔(Date·Flight)}. In this case, {(Flight·Pilot)↣Date NOR (Pilot·Date)↣Flight NOR (Date·Flight)↣Pilot} yields (Pilot·Date)↣Flight after determinator designation

8.13.3 Determinant/PK selection Main determinant is selected in the same way of determinator PANEL subset: {Flight↔(Destination·Time)}; In the case of an existent single determinant just "IF Flight↔(Destination·Time)THEN Flight" is enough for having the next primitive FD: Flight↣(Destination·Time).

8.13.4 Representable FD structure Determinator and all codeterminants (the future alternate keys) vanish, yielding a representable FD structure, for instance: PANEL: {(Pilot·Date)↣(Flight)↣(Destination·Time)}.

8.12 Functional dependency 2.2 specs

9.

Normalizing PANEL (Pilot, Flight, Destination, Time, Date) using the already known tabulation starts formalizing in FD 2.2 some Maier restrictions [1983Mai]: 1. "For each flight there is exactly one time", i.e. Flight↣Time. 2. For each flight there is exactly one destination, i.e. Flight↣Destination. 3. Destination↮Time; -- observed. 4. (Destination·Time) ≝ (Destination↮Time). 5. Flight→(Destination·Time). 6. (Destination·Time)→Flight. 7. Flight↔(Destination·Time); -- 1st primitive FD. 8. IF Flight↔(Destination·Time) THEN Flight↔(Destination·Time) NOR (Destination·Time)↔Flight; 9. Flight↣(Destination·Time); -- determinator selection A. Flight↮Pilot; -- observed. B. Pilote↮Date; -- observed. C. Date↮Flight; -- observed. D. (Flight·Pilot) ≝ (Flight↮Pilot). E. (Pilot·Date) ≝ (Pilote↮Date). F. (Date·Flight) ≝ (Date↮Flight). G. (Flight·Pilot)↣Date; observed. H. (Pilot·Date)↣Flight; observed. I. (Date·Flight)↣Pilot; observed. J. IF (Flight·Pilot)↣Date NOR (Pilot·Date)↣Flight NOR (Date·Flight)↣Pilot THEN (Pilot·Date)↣Flight. K.(Pilot·Date)↣Flight. L PANEL: {(Pilot·Date)↣Flight; Flight↣(Destination·Time)}.

Base relation means that it is original. It is the main object of RM and the unit of database design. However, a derived relation — which is the opposite of a base relation— can be known at the same time of its base relations as part of Functional analisys in the form of a report specs. After knowing the ancillary concepts of Functional dependency [1971Cod] , having solid composite attributes and following E. F. Codd [1970Cod] & [1971Cod], this chapter continues defining the database relation and more design components. The sections of this report are the following: • Base relation; • Derived relation (SQL view); • Candidate key; • Minimal subset of R; • Primary key; • Alternate key; • Primary key properties (1); • Axiom of minimal PK; • Primary key properties (2); • Primary key properties (3); • Foreign key; • Primary key properties (4).

E. Villar THIRD NORMAL FORM FUNDAMENTALS

BASE RELATION

9.1 Base relation Base relation means that it is original, that its tuples are unreachable indirectly using a query formula mentioning other base relations.

2016-01-2412:51:18 Page 20 of 76

9.2 Derived relation (SQL view)

9.5 Primary key

A derived table —aka data view and SQL VIEW— is a named database table whose shortest intension is a definition in terms of other database table(s) [1971Hea], e.g. ACTIVE_PROJECT(SUPPLY[Project]). In SQL, derived tables are called simply “views”. Usually, every functional report has its own underlinning view.

Primary key abbreviates as PK. "A relation may possess more than one nonredundant" [1970Cod] "candidate key" [1971Cod]. "Whenever a relation has two or more nonredundant" [1970Cod] "candidate keys" [1971Cod], one of them is arbitrarily selected and called the primary key of that relation" [1970Cod]. The arbitrary PK selection is a right and a duty of the designer, for example, it can the last "reason" of the "Representation principle" [1986Zan]. Obviously, this arbitrary PK selection is not an invitation to suspend the own functional experience of the designer.

9.3 Candidate key Candidate key abbreviates as CK. "Normally, one domain (or combination of domains) of a given relation has values which uniquely identify each element (n-tuple) of that relation. Such a domain (or combination) is called" [1970Cod] "a candidate key" [1971Cod]. "A candidate key" [1971Cod] "is nonredundant if it is either a simple domain (not a combination) or a combination such that none of the participating simple domains is superfluous in uniquely identifying each element" [1970Cod]. "A candidate key K of relation R is an irredundant combination of attributes (or a single attribute) of R with properties P1 and P2: P1(Unique identification): In each tuple of R the value of K uniquely identifies that tuple; P2(Minimality): No attribute in K can be discarded without destroying property P1" [1971Cod]. Each "candidate key has the property of identifying each row uniquely" [1990Cod]. Candidate keys can be a single-attribute or an irredundant multiattribute. The primary key is one of candidate keys.

9.3.1

One CK always exists

"Obviously, there always exists at least one candidate key, because the combination of all attributes of R possesses property P1. It is then a matter of looking for a subset with property P2" [1971Cod]. Two properties of candidate keys can be deduced from P1 and P2:

9.3.2

Each attribute depends on each CK

"P3: Each attribute of R is functionally dependent on each candidate key of R" [1971Cod].

9.3.3

Structure of a multiattribute CK

"P4: The collection of attributes of R in a candidate key is a maximal functionally independent subset of R and no other attributes of R can be added without destroying this functional independence" [1971Cod].

9.6 Alternate key "Alternate key" is not a Codd term but after selecting the primary key of R, 'AK' not only abbreviates 'Alternate key' term, it is more clearer than the circumlocution "a candidate key being not the primary key".

9.6.1

Alternate key definition

"An alternate key is an attribute or attribute combination that obeys the same uniqueness and minimality constraints as the primary key but it is not the primary key” [1986Dat]. However, after PK selection, some restrictive properties of the candidate keys do not apply to themselves in the new alternate key status.

9.6.2

AK component IS NOT NULL WITH DEFAULT

Any tuple of a current AK is allowed to have a default value for the attribute components without any restriction. For instance, the SQL UNIQUE index enforces smoothly the UNIQUE constraint.

9.6.3

AK component IS NULL

"Any tuple [of a current AK] is allowed to have undefined value for the attribute components" [1971Cod]. However, the SQL UNIQUE index «does not enforce the constraint 'unique unless NULL' —it enforces the constraint 'unique, with at most one NULL'» [1986Dat].

9.7 Primary key properties (1) This section deals with independent properties of the primary key: (1) PK existence; (2) PK uniqueness.

9.7.1

PK existence

A subset of R for being minimal must have two properties: (1) Be an irredundant multiattribute of R; and, (2) Be functionally irreducible; for instance, an irredundant multicolumn candidate key K is minimal for "P2: No attribute in K can be discarded without destroying" [1971Cod] the uniqueness; (3) In fact, subset minimality is a functional "steady state".

The first property of the primary key is its own existence. (a) The primary key exists for being the singular member of the set of candidate keys guaranteeing that such set is a non-empty set; (b) A primary key always exists in R because it is the representative of the predicate subject in reality. Always, understanding that the representation of a "subject", in some cases, can need more than one attribute, for example, the subject of a characteristic entity [1979Cod] uses a "definite description" [1910R&W] that needs two or more attributes.

9.4.1

9.7.2

9.4 Minimal subset of R

CK minimality is an steady state

The minimal size of a CK influences its functionality, for example: If its number of attributes is reduced, the candidate key loses the uniqueness; If increased, the CK loses the minimality.

9.4.2

Nonkey minimality is an steady state

The minimal size of nonkey domain of R influences its functionality, for example: Reducing its number of attributes, the nonkey domain losses the completeness; If the nonkey domain increased its sized, losses the full key dependency virtue or the irredundant quality.

9.4.3

Any attribute is minimal

As expected, any prosaic attribute or a singleton (i.e. a subset of a single attribute) of R always is irredundant and minimal by definition. E. Villar THIRD NORMAL FORM FUNDAMENTALS

PK uniqueness

The uniqueness of the primary key is part of the Entity Integrity of RM: • "Primary key distinguishes that row from every other row in its base table" [1990Cod]; in other words, • "No two rows of R have the same" [1979Cod] PK-value.

9.8 Axiom of minimal PK The Axiom of minimal PK is the following. Any multiattribute primary key of R in first normal form is minimal (1). The axiom (1) refers to any multiattribute PK because a monoattribute PK always is minimal for attribute definition.

2016-01-2412:51:18 Page 21 of 76

Caveat. We do not know how each current SQL does enforce the minimality of a designated multiattribute primary key during SQL specs.

9.8.1

Postulate of minimal CK

An example of a surrogate key [1979Cod] is "Assign#" attribute added to the original R (Emp#, Project#) in order to easily referencing such associative entity in the following Codd's figure [1979Cod].

The CK postulate generalizes axiom (1) in the following way. Any multiattribute candidate key of R in first normal form is also minimal for being the source of any primary key (2) which is always minimal according (1).

9.8.2

Corollary on minimal CK

Any multiattribute CK in 1NF also is minimal (3). Any multiattribute CK in 1NF also is irredundant for the ⊋ the (4).

9.8.3

Corollary on PK minimality

The primary key being a singular candidate key inherits the corresponding properties of being irredundant and minimal (5). Corollary (5) is the converse formulation of Axiom (1).

9.9 Primary key properties (2) This section deals with independent properties of the primary key: (3) Each PK component IS NOT NULL; (4) Each PK component has a genuine value.

9.9.1

Each PK component IS NOT NULL

"Rule 1 (entity integrity): No primary key value of a base relation is allowed to be null or to have a null component" [1979Cod]. In other words, there is a value per PK component, i.e. in every row, each PK component always has a value. Every PK component cooperates to identify each row being NOT NULL. If some component of a composite PK would have a mark instead of a value, the PK of R would lost the uniqueness property for such a row.

9.9.2

Each PK component has a genuine value

In other words, NOT NULL WITH DEFAULT is avoided in the PK components. In every row, each PK component always has a value coming from the reality.

9.10 Primary key properties (3) The following primary key features merits its own entry: (5) Exactly one primary key; (6) When designer creates the PK; (7) Each entity type, its identifier type;

9.10.1 Exactly one primary key "Each base table has exactly one primary key" [1990Cod], i.e. ∀R: ∃!. This rule reinforces the PK role as subject of the underlining predicate of R. Besides, there is not a functional reason supporting the idea of a secondary key. This pseudo-enhancement to relational model was discussed and closed by Chris date in 1986 [1986Dat]. A secondary key can be introduced under other names, for instance, a secondary key appears as UNIQUE and CANDIDATE KEY in SQL. Specifying, as part of CREATE TABLE, more than one PRIMARY KEY implies (under the cover) the physical creation of an index. But such index —which is part of physical design— starts based only on imagined future application traffic. The idea of a secondary key is part of the meme that data design can influence the future application performance. Please, find Normalization vs. application performance topic in the following sections: DBMS performance vs. 3NF, Occasionally Homer takes a nap, and Normally Homer is Homer.

9.10.2 When designer creates the PK Creating a surrogate key (abbr. SK) bear by good functional reasons and it is part of designer duties. The main motivation for this designer initiative is "the difficulty of specifying a cross reference to a particular association when it has no surrogate identifying it uniquely" [1979Cod]. E. Villar THIRD NORMAL FORM FUNDAMENTALS

The second idea behind a surrogate key is facilitating data manipulation in the software components, for instance, substituting a too big PK or a variable length PK by a standard fixed length surrogate key. The surrogate key is a real primary key of reserve for substituting a complex natural primary key, as stated.

9.10.3 Each entity has its own PK type The functional properties of each type of RM/T entity include the characteristics of its own type of identifier. Please, see below, the "Palette of primary keys" section.

9.11 Foreign key "And a key axiom stating that there is a very special binary relation, called E(∊), which represents the membership relation on sets" [2006Mad]. "We shall call a domain of relation R a foreign key if it is not the primary key of R but its elements are values of the primary key of some relation" [1970Cod]. Foreign key abbreviation is FK. Each time that you mark an attribute (or an irredundant subset) of R as foreign key, e.g. R.L∊S, for each distinct foreign key value in R.L there "exist a matching primary key value from the same domain" [1985Cod] in the referenced relation of L, i.e. S, of the same database schema of R.

9.11.1 Properties of the foreign key 1) Before marking the attribute L as FK in R, the home-set of L, e.g. RL, must exist; besides, the primary key of RL must be L, e.g. RL (L, B, C); 2) R is the referencing relation e.g. R (P, J, L∊RL); and RL is the referenced relation; 4) A foreign key mark can apply: 1) To a component of the primary key; 2) To a descriptive attribute; or 3) To a denotative candidate key.

9.11.2 Examples of referencing entities The measures of a data warehouse are kernel hierarchies whose composite PK is a set of independent components being each a foreign key. Each component of the binary PK of an associative entity always is a foreign key, for instance, an associative entity AB (SubjectA∊A, SubjectB∊B) represents the many-to-many relationship between two different entities. A reflexive association always is many-to-many and it is among a pair of different subjects of the same entity, for instance, BOM (Part∊MATERIAL as Part1, Part∊MATERIAL as Part2). There is also the case of an ordered and telescopic structure of FK's in a multiattribute PK of a characteristic entity, for instance, CH (〈〈〈P∊RP〉, F〉∊RPF, L〉∊RPFL, T, E, J).

9.11.3 Example of referenced only entity A set of pure subjects, i.e. R (P), is a class. Its semantic is than each participating element of R exists in reality, e.g. a class entity with the possible model colors of a bike factory. This type of monoattribute relation is a kernel entity. A kernel entity tends to be a referenced relation with its subset of immediate attributes but some of these attributes can be foreign key.

2016-01-2412:51:18 Page 22 of 76

9.11.4 FK glues the schema A set of FK related relations is a relational schema with referential integrity. It is also customary speaking of a subschema or a microschema when the current schema is under construction or it is already part of another schema. The plural of schema is 'schemas' or 'schemata'.

9.12 Primary key properties (4) The forth series of primary key features are related to the foreign key role of any single PK: (8) The PK cannot be FK in its own entity; (9) R primary key as descriptive attribute of S; (10) R primary key as PK component of S; (11) R primary key as denotative attribute of S.

9.12.1 R.PK cannot be FK in R entity The single primary key of R is a valid reference in any relation of its schema except in R. This occurs because each PK is the identifier of "an individual in reality" [1987A&S] but the identifier itself is not.

9.12.2 R.PK as descriptive attribute of S A single primary key of R can be a foreign component of the predicating part of S. This role is typical for a primary key of a kernel entity.

9.12.3 R.PK as part of S PK A single primary key of R can be a foreign component of a composite PK, for instance: - in the primary key of an associative entity S; and - in the left part of the PK of a characteristic entity S;

9.12.4 R.PK as denotative attribute of S In the relational model, one-to-one ERA relationships are represented by a simple foreign key being a supervened candidate key. There are two possible one-to-one 3NF micro-schemas between entities R and S. At 3NF CLASS ℂ and 3NF CLASS ⅅ, please, find the details.

10. DATABASE INTEGRITY Database integrity aims to prevent unintentional changes to information in order to keep accurate and consistent every item of information along its entire life-cycle [2015WDI].

The overall intent of any data integrity initiative is: Ensure data is recorded exactly as each application system was accepted and, upon later retrieval, ensure the data item has the same value that was originally recorded [2015WDI] in the database. "The opposite of data integrity is data corruption, data loss" [2015WDI] and data redundancy [1971Cod]. Data integrity tasks start when the database is already operational and the born of a database occurs with a humble "table load". This report has the following sections: • Initial table load; • Data security; • The datum; • Table integrity buttons; • Transaction concept; • RAS of a database; • Data integrity scope; • RM data integrity; • RM integrity constraints; • RM facilities; and • RM high quality services.

10.1 Initial table load Loading the initial rows of a currently empty table is one of the task of database administrator (abbr. DBA). And s/he performs the task with the LOADER. A load utility, normally coming with the DBMS product (be it relational or not) having several data compression options. The standardization of r-DBMS commercial products allows crossing utilities competing with the in-house LOADER of the rDBMS of reference. "All load utilities are designed to move large volumes of data ―collected from data sources on channel and network-attached clients― into empty tables in the database" [2015WIL]. The LOADER "typically offers higher performance levels than a standard application program written to load data to an empty table because data allocation, conversion, movement, and loading are automatic and performed in parallel" [2015WIL]. Obviously, crossing LOADERs perform from many client platforms, mainframe, or load server to move data into the empty table" [2015WIL]. After fast and raw data load, our table is a table with many rows. The DBA will finish the initial load, indexing the current table applying the notes agreed upon during the last stress test of the given table.

10.2 Data security Data security aims are to prevent unauthorized accesses to information and to prevent any intentional insertions, modifications and deletions to information being alien to corporate policies and, in this case, be the changes performed by authorized people or not [2015WDS].

10.3 The datum "Each and every datum" [1985Cod] in the database is an atomic value [1970Cod], i.e. morphologically, syntactically and "semantically nondecomposable" [1987S&K]. Each database item is "logically accessible" [1985Cod] by "a combination of table-name, primary-key-name (value), and column-name" [1985Cod]. The datum is the object of interest of Data integrity and Data security.

10.4 Table integrity buttons After initial load, a table starts its life-cycle demonstrating that it is a time-varying set of rows. The actors are application programs together a database schema already checked until the functional limits of the both sets of software deliverables. An application may have a cycle of reparations, adaptations and enhancements along many years. However, the rows of every table follow the corresponding business cycle of being introduced, being changed (if it is needed) and being deleted when the interest of the owner of the table vanished by business reasons. The programs reflects the daily, weekly, monthly, etc. business behavior (its own running) and they have four buttons influencing the row cycle and the item cycles of business volatile attributes, i.e. only at insertion, delete, change or commit can suffer the database integrity. Although, data lost or data corruption are discovered during database queries. Thus the potential protagonists of table integrity breaking up are: • INSERT statement; • UPDATE statement; • DELETE statement; and • COMMIT transaction statement.

10.4.1 INSERT statement An SQL INSERT statement adds a new row to an existing table as part of a database transaction.

10.4.2 UPDATE statement An SQL UPDATE statement changes the value of an attribute of an existing row of a table. It implies a previous select —that reads and E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 23 of 76

holds such row— for being able of performing the change —and the other changes (if any)— as part of a database transaction.

Tip: A good MTBF is 99.999% of the Unit-of-time. But 99.9999% and 99.99% are used too [1997SUN] (please, see Figure 8).

10.4.3 DELETE statement The SQL DELETE statement removes one existing row from a table. It is also part of a database transaction protocol similar to the UPDATE statement. INSERT is unaware of the mentioned UPDATE/DELETE protocol (obviously, it cannot read a future row).

10.4.4 COMMIT statement The COMMIT transaction statement closes any combination of the three statements mentioned on any combination of tables, rows and attribute values: • Several tables, • The intervening rows of the target tables; and • The attribute values of current rows; • All in an instable machine.

10.5 Transaction concept Jim Gray in its seminal work on transaction concept [1981Gra] states the following: "A transaction is a transformation of state which has the properties of: • Consistency: The transaction must obey legal protocols; • Atomicity: It either happens or it does not; either all are bound by the contract or none are. • Durability: Once a transaction is committed, it cannot be abrogated (it effects survive failures)". ―Jim Gray, "The Transaction Concept: Virtues and Limitations".

10.5.1 Transaction & database integrity "The transaction concept is key to the structuring of data management applications" [1981Gra]: Understanding the 'structuring of data' in its real service: The integrity of the data values of the different rows intervening in a transaction being each row of the same table or not, and being the tables of the same database or the different peer databases.

10.5.2 Transaction internals COMMIT is a modern and real demiurge able to get a stable engine with a set of unstable components [1956vNe]. Currently, the key of the invention is having (at least!) one hard disk (of a battery of n disks) ready during the time of one physical write to disk (a physical input/output disk operation, abbr. PI/O). The calculation of n continues following Von Neumann's formulas of 1956 [1956vNe]. n for him were hundreds or thousands of vacuum tubes.

10.6 RAS of a database «Two primary purposes of databases are to attenuate data redundancy and enhance data reliability» [1983Mai] and the reliability “experienced by other System/370 users is the result of a strategy based on RAS (Reliability·Availability·Serviceability)" [1970IBM]. RAS is the acronym of Reliability · Availability · Serviceability. They are qualities of a unit of hardware or a unit of software already in production: o Reliability is complying with the specifications: no more and no less; o Availability is making the service in a defined time; o Serviceability is spending much time doing service and little "housekeeping", i.e. whatever activity that is not doing service after products installation, i.e. starting, training, maintenance, recovering and control.

10.7 Mean time between failures Mean Time Between Failures (MTBF) is the average number of minutes (Minutes-Doing-Service) by Unit-of-time (quarter, year). It is the measure of the RAS of any hardware of software component. E. Villar THIRD NORMAL FORM FUNDAMENTALS

Figure 4: RAS metrics (source [1991G&S])

10.7.1 Table RAS The RAS of a table is the shortest MTBF of the set of systems using a given table.

10.7.2 Database RAS The RAS of a database schema is the shortest MTBF found in the set of its tables.

10.8 Data integrity scope Data integrity is a critical part of the Functional analysis, i.e. the Data design [2015WDI], the report design (and its underlining data VIEW design). Possibly, Data integrity scope can excess Data design, for instance, Physical database design; Database operational usage as backup's, emergency plan and physical planning; Database change/enhancements (abbr. C/E's); RM data inspection [1991Han].

10.9 RM data inspection Casual users, i.e. decision making & planning, can perform AD HOC queries in the production database looking for unnacurate, obsolete and uncompatible data in order to immediately repairing them and propossing changes in the application programs and/or in the production tables in order to solve the cause of such troubles (which are the activities propossed by Hansen in its seminal paper "Zero defect data" of 1991 [1991Han]). These activities also can be extended to the data warehouse defining volatil and complex cubic queries on the statistical evolution of unnacurate, obsolete and uncompatible operational data.

10.10 Codd's Twelve rules of 1985 Rule 1: The information rule Rule 2: Guaranteed access rule Rule 3: Systematic treatment of NULL values Rule 4: Dynamic online catalog based on the RM Rule 5: Comprehensive data sublanguage Rule 6: View updating Rule 7: High-level Insert, Update, and Delete Rule 8: Physical data independence Rule 9: Logical data independence Rule 10: Integrity independence Rule 11: Distribution independence Rule 12: Non-subversion Summary 1: Codd r-DBMS rules (1985)

10.11 RM data integrity The r-DBMS basic rules informing the data integrity are the following [1985Cod]: • Information rule; •

Atomic datum coordinates; and 2016-01-2412:51:18 Page 24 of 76



Comprehensive data sublanguage.

10.11.1 Information rule "All information in the relational database is represented in exactly one and only one way —by values in tables" [1985Cod].

10.11.2 Atomic datum coordinates "Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to the following trinomial" [1985Cod]: 〈table-name, primary-key-name (value), attribute-name〉

10.11.3 Comprehensive data sublanguage "There must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and whose ability to support all of the following is comprehensible: 1. Data definition; 2. View definition; 3. Data manipulation (interactive and by program); 4. Integrity constraints; and 5. Transaction boundaries (Begin, Commit, Rollback)” [1985Cod].

10.12 RM integrity constraints The database language should support integrity constraints that restrict the data that can be entered into the database and the database modifications that can be made [1985Cod]. The r-RDBMS essential data integrity constraints are the following: Entity integrity (primary key); Referential integrity (foreign key); Definable user integrity; and Non-subversion rule.

10.12.1 Entity integrity (primary key) "No component of a primary key is allowed to have a NULL value. Any r-DBMS must control the primary key explicit definition on NOT NULL components and the uniqueness of each row based on the key value at physical level, e.g. with an index" [1985Cod].

10.12.2 Referential integrity (foreign key) "Some deletions and updates may be triggered by others, if deletion and update dependencies between specified relation relations are declared in R" [1970Cod]. "Any r-DBMS must support the concept of a foreign key and the corresponding Data Definition Language (abbr. DDL) facility, i.e. 1. A table R can build its primary key components from values coming from the primary key of another tables. 2. A primary key value of some table in the database can be the source of values of any nonkey attribute of R. Any r-DBMS engine must constraint that for each distinct nonNULL foreign key value in a relational database, there must exist a matching primary key value from the same domain" [1985Cod].

10.12.3 Definable user integrity "Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog" [1985Cod], not being part of the application programs. Current strong candidates of the user integrity definition are: The so-called "metadata" of current data warehouses, the groupedgrouping table schemata, the optionality side in some associative entities, the Unit-Of-Meassure (abbr. UOM) [1999Cor] of some columns, and the unavoidable Inclusion dependencies [1990Cod] , etc.

10.12.4 Non-subversion rule "If a relational system has or supports a low-level language, that low-level language cannot be used to subvert or bypass the integrity rules or constraints expressed in the database catalog" [1985Cod].

E. Villar THIRD NORMAL FORM FUNDAMENTALS

10.13 RM facilities Data integrity is an horizontal topic crossing the majority of database facilities and services offered by the r-DBMS engine being compliant with the following RM rules of E. F. Codd: View updating rule; Physical data independence; Logical data independence; and Distribution transparency.

10.13.1 View updating rule "All views that are theoretically updateable are also updateable by the system" [1985Cod].

10.13.2 Physical data independence "Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods" [1985Cod].

10.13.3 Logical data independence "Application programs and terminal activities remain logically unimpaired when table maintenance consist in adding new columns" [1985Cod].

10.13.4 Distribution transparency "The data manipulation sublanguage of a relational DBMS must enable application programs and terminal activities to remain logically unimpaired whether and whenever data are physically centralized or distributed" [1985Cod]. "This rule says that the database language must be able to manipulate data located on other computer systems. In essence, we should be able to split the data on the r-DBMS out onto multiple physical systems without the user realizing it" [1985Cod].

10.14 RM high quality services RM high quality services takes advantage of all the previous data integrity rules and facilities that really have been implemented in a lot of commercial and Open/source r-DBMS products. And also thanks to SQL maturation as "intergalactic dataspeak" [1990Sto]. Thanks to its wide extension in any hardware platform. Thanks to its standardization supported by all the giants. And thanks to the maturation of the ODBC/SQL [1995Gei] allowing any data of any database ―be it under an r-DBMS or not― simulating the RM behavior in QUERY-ONLY-MODE of SQL. Without all mentioned protagonists, all these sophisticated services would be take not so seriously as they merit.

10.15 On the systematic treatment of NULL "Rule 3: Systematic Treatment of NULL Values" [1985Cod] is the unique Codd's rule unmentioned until now in the DATABASE INTEGRITY report. The state-of-affairs is having the MAYBE symbol (ω) and five competing interpretations: Undefined value, unknown value, missing value, inapplicable property and unfilled optional value. We are convinced that {PM Identity & Third normal form} together is "a relationship of mutual benefit" (Wiktionary) and {MAYBE Identity & Third normal form} team cannot compare. In UNMARKED NORMAL FORM report, please, find our approach.

11. DATABASE REDUNDANCY Currently, data redundancy is the only cause of data corruption and data loss in all the relational databases from the beginning (1970) [1971Cod] and after the commercial expansion (1986) [1985Cod]. DATABASE REDUNDANCY covers all the theoretical, technical, know-how and politics for data lost and data corruption prevention. Besides, Data redundancy field has a long and brilliant publication curriculum. Let explain the current Data integrity field, its theory, its achievements, its myths, its lacks and its memes. With its own topics and names. 2016-01-2412:51:18 Page 25 of 76

This report deals with the following sections: • Redundancy troubles; • Data lost; • Data corruption; • Delayed insertion; • First cut on data redundancy; • Redundancy effects are vacant; • Data design causing redundancy; • Set of sophisticated predicates; • Business integrity rule; • 3NF is a natural data design; • Zero defect data; • If Zero defect data design … • ... then Zero defect data; • DBMS performance vs. 3NF; • Occasionally Homer takes a nap; and • Normally Homer is Homer.

11.1 Redundancy troubles Database redundancy yields each data modification a risk of essential information lost. Disk redundancy is a very good thing for everybody but data redundancy is not. We use 'redundant' adjective in the following sense: "Data that can be dropped without semantic loss" for example, centralizing the codes, e.g. "NY", and its meaning, i.e. "New York", in the corresponding kernel table, dropping the meaning but not the code attribute from the tables having the attribute with its plain meaning. Obviously, the code must be a foreign key referencing the table where the meaning is. New dedicated maintenance fair programs the new 3NF relations must be developed, tested and installed. In order to not impairing current query programs, old normalized table name would be a view of a natural join of the new repaired tables. But the change/enhanced old maintenance programs go to wastebasket. This is why normalizing "from the beginning" is a good advice.

11.2 Data lost The data lost risk will be a data lost case every time that a delete affect to a single obsolete row or to a set of obsolete rows having the last information of a business data entity.

11.2.1 Delete anomaly A database deletion anomaly [1971CEF] occurs when if we delete an uninteresting individual row but then we delete additional facts about another individual row being the first deletion wanted but the second unwanted. Suppose, we want to delete an obsolete part of all the suppliers, the deletion performs as expected except if some supplier only offers this type of part [1971Cod] because together the obsolete part, the information of its supplier is deleted, too (without a business reason).

11.3 Data corruption The data corruption risk will be a data corruption case every time that a change affect to an unknown set of redundant rows. Otherwise, i.e. if the set of corrupted rows is known, the corruption data will persist until it will be fully repaired. Probably with an "adhoc" program delivered after the set of application deliverables.

11.3.1 Update anomaly A database updating anomaly[1971CEF]occurs when we update some attribute of an individual row and then: (a) The number of copies of this information to be changed depends on the number of another individual rows, and (b) If some of the other individual rows is forgotten then the database information becomes inconsistent. For instance, if we change the city of a given supplier, we must change the city of a supplier the same number of times that the number of parts (of that supplier) [1971Cod]. E. Villar THIRD NORMAL FORM FUNDAMENTALS

If some part remains unchanged, the city information of the supplier is false in the unchanged parts and correct in the already changed parts of the given supplier but the database is in an inconsistent state.

11.4 Unexpected insertion delay A delayed insertion is unexpected every time that the user tries to add new business information unaware that its data only will be accepted are part of a “future” row (but currently inexistent). The system asks for some data (of the composite primary key) that the current user unknowns. The user intentions will continue frustrated until somebody register the mentioned inexistent row.

11.4.1 Insert anomaly A database insertion anomaly [1971CEF]occurs when we cannot insert a new individual row until we know additional facts about another individual row, being the insertion conceptually independent of the mentioned facts. Suppose we want to store a new supplier of a given city in a relation T(SupplierNr, PartNr, City) [1971Cod] but we cannot enter this new row into the relation T until we will know some of PartNr made by the new supplier because the primary key of T, i.e. SupplierNr, PartNr, constraints the simultaneity of both things.

11.5 First cut on data redundancy The risks of information lost and data corruption are persistent, pervasive and immune to perfect commit protocols, perfect programs and 80% of skilled operational people. Because the effects appear in the misplaced site, i.e. in the database operational side, but the people able of solving the operational problem is in another business area, performing new projects or even in another company.

11.5.1 Beware of Greeks bearing gifts Besides, there is a bit of immaturity in the data redundancy business, because: (1) The data designer must invest some extra effort designing the redundant tables; (2) Designer motivation is improving the performance of the operational application; (3) However, the performance gains in response time and CPU time is balanced by an operational cost of wasting more disk space; (4) Besides and concluding, the rampant application disfunctionality of mentioned anomalies go fast and the redundant table performs worse than its irredundant counterpart in single table queries and in table maintenance; (4) Normally, the data designers does not check the opinion of mentioned affected people; and (5) Most of the times, the operational people would agree with the redundant designer proposal.

11.6 Redundancy effects are vacant In fact, only specific data design defects imply database redundancy that affects the data integrity despite having an rDBMS Entity integrity, Referential integrity and User integrity fully compliant. Repairing the design defects causing redundancy simplifies the database design and the new maintenance programs. The reporting programs only needs changing each 1NF relation by a 1NF view joining the corresponding 3NF tables. But this type of maintenance projects tends introducing a “vacant” delay because the unwanted consequences are operational but solving them is not part of operational duties.

11.7 Data design causing redundancy The only cause of RM database redundancy is an initial and volunteer type of relation design trying to boost query performance and probably unaware of being the cause of RM database redundancy. 2016-01-2412:51:18 Page 26 of 76

There are only three data design variants causing unexpected data redundancy in the operational database: • Partial dependence; • Transitive dependence; and • Internal dependence. The next definitions are independent among them. The general and formal definitions of these design defects can be found in DESIGNER NORMAL FORMS chapter.

11.7.1 Partial dependence A nonkey attribute has a property of a key attribute [1983Ken] of a composite key.

11.7.2 Transitive dependence A nonkey attribute has a property of another nonkey attribute [1983Ken].

11.7.3 Internal dependence An attribute y of a subset of R has a property of another attribute x of the same designated subset, for instance, a designated composite primary key (P, F) of R (P, F, D, L, T, E, J), the nonkey attribute F has a property of P which is a conspicuous key attribute of R. The other internal dependence case occurs inside the non-key subset of R being the y→z element of the welstablished transitive dependence, i.e. IF x→y & y→z THEN x→z.

11.8 Set of sophisticated predicates Codd in 1979, looking for "meaningful units that are as small as possible" [1979Cod] presented some new database predicates that we know as "RM/T entities" and he opened the way for a data design in Third normal form [1971Cod] from the beginning. The RM/T entities are a type of predicate whose instances can continue sharing a polyadic set but now directly, i.e. without "further normalization" [1970Cod]. The previous "normal form" [1970Cod] of a relation, i.e. R (P, F, D, L, T∊B, E, J), were unaware on the allowed small digressions into the descriptive discourse, e.g. an attribute with the property of another attribute being not the predicate subject. This semantic research not only redresses such excursus, it is giving the patterns that would orientate the data designer in his daily duties. From the Logician point of view, this step is an imaginative and interesting advance. But from the point of view of data designers, database administrators and database daily operation responsible people is a giant step.

11.8.1 Class predicate Class predicate "is not existence dependent on any other entity" [1979Cod] and it is with one standalone primary key and without descriptive attributes.

11.8.2 Associative predicate The associative predicate is a two places predicate like xRy, i.e. 〈x, y〉, but also recognizing the internal existence of the converse relation, i.e. 〈y, x〉. For example, R (P, L) where P is the EmpNo of a programmer, and L is the EmpNo of a project leader when leaders and programmers can sharing projects.

Finally, a kernel identifier can be one of the subjects of an associative predicate.

11.8.5 Characteristic predicate Characteristic predicate "fills a subordinate role" in describing a superordinate individual by a set of its own predicate instances (not by accumulation of properties). A characteristic description consists in a "vertical" set of predicates not in an "horizontal" set of properties. The subject of a characteristic entity is an ordered denotation of two parts, for instance, in the COUNTRY characteristic predicate, StateCode (the subject of STATE predicate) is definitely denoted by CountryCode and StateId. In the following SPACE database schema, we can appreciate some of mentioned predicate at work: SPACE: { COUNTRY (CountryCode, CountryName); STATE (〈〈CountryCode∊COUNTRY〉, StateId〉 AS StateCode, StateName)}; The first part is the subject of characterized entity and the second is a sequence numbering the "children" rows "characterizing" the superordinate predicate. This type of predicate whose existence depends on the previous subject of superordinate entity, was called by P. Chen "weak entity" [1976Che]. Characteristic table pattern entry completes the practical details of characteristic predicate.

11.9 Business integrity rule The data integrity rule abstracted from the new predicates of RM/T entities is: "ONE ENTITY PER RELATION" FROM THE BEGINNING WHOSE OPERATIVE RESULT IS "ONE ENTITY CASE PER TUPLE" IN EVERY TABLE [1983S&B] OF THE PRODUCTION DATABASE FOR EVER. Letting partial and transitive patterns for the functional reports and its underlining data-views with natural joins.

11.9.1 Failsafe redundancy The database designer community knows that other worlds exist in which redundancy plays positive roles. But all the worlds are in this one. We enjoy with the Quine's example of a redundant micro-world: "A judicious redundancy, even so, is the breath of life. It is fallback and failsafe. It is why we address our mail to city and state in so many words, despite the zip code. One indistinct digit in the zip code could spoil everything" [1987Qui].

11.9.2 Failsafe irredundancy

11.8.4 Kernel predicate

We perform and enjoy with the "breath of the life" designing functional report and the judicious, fallback, failsafe and redundant data views underlining the mentioned reports. It is also true that in all our possible micro-worlds, "zero percent of hidden redundancy in production database" on the basis of "zero defect data design deliverables" is more "fallback and failsafe" than any other alternative.

Kernel is a conspicuous descriptive predicate: A subject and the set of attributes corresponding to the subject's properties. Kernel entity can participate in a descriptive family, i.e. its single primary key is referenced by nonkey attributes of more than one entity. Kernel entity can be the seed of a characteristic hierarchy.

When you design tables directly in third normal form, "you have four goals” [1999Poo]: • "Arranging data into logical groupings such that each group [of attributes] describes its functional part of the" [1999Poo] micro-world of interest (no more, no less);

11.8.3 Denotative predicate Denotative predicate, as stated, is the special RM/T associative predicate dealing with one-to-one ERA relationships. Codd never mentioned its particular role.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

11.10 3NF is a natural data design

2016-01-2412:51:18 Page 27 of 76

• • •

Increasing the data integrity, maximizing the crossreferences between tables; Building a Logical database in which Physical designers will refine their standard constructs empowering progressively the application queries until its steady state; "Organizing the data such that, when you modify it, you make the change in only one place" [1999Poo] increases data manipulation performance, avoids data redundancy and, without redundancy also go away the only known case of professional collaboration with data corruption risks, i.e. data lost after delete, delayed insertions and cascade of changes.

11.10.1 3NF is the RM/T default "For each base relation, the DBMS assumes that all columns that are not part of the primary key are functionally dependent on the primary key, unless otherwise declared" [1979Cod].

11.11 Zero defect data "Zero defect data" [1991Han] in the Logical design means zero database exceptions in production environment: Zero insertion anomalies; Zero update anomalies; Zero delete anomalies. The risks of a low normalization are the following [1971Cod]: Insertion anomaly.- After correctly preparing a new row, the program gets a rejected insertion; Update anomaly.- Before a punctual change of the value of an attribute of a row of a table, knowing the redundancy of the current database schema, you need performing not one but a saga of changes (under penalty of data corruption); and Delete anomaly.- After deleting an obsolete row, you realize that some information of interest has been lost.

11.12 IF {Zero defect data design} … At design phase, designers can choose between two Data design styles: The prescriptive LEGO® DATA DESIGN or the patternoriented HOLISTIC DATA DESIGN. In both ways, the designer will get "one entity type per relation". And, after delivering, the immediate consequence will be having "a set of irredundant rows" [1971Cod] in production database for ever. In the environment of the stress Test phase, and under the care of the author, each table can participate in a Data Review. Normalization is the main procedure that takes care of DATA REVIEWs, performing the dependency analysis using the table rows of the stress environment for getting the primitive FD’s set from the corresponding observations. This type of data review not only performs a neutral report of the design defects of the table. A full proposal for the table's author accompany every normalization performance. The proposal always is a micro-schema with a set of "zero defect tables", or in positive words, each table contains its Logic predicate and only its Logic predicate.

11.13 … THEN {Zero defect data} (1) Data design quality is conformance to requirements; (2) Defect prevention is preferable to data review and defect correction because probably it will be performed by another people; (3) Knowing 3NF lemma of "Each entity type its own relation", "zero partial dependence" and "zero transitive dependence" defects is the standard quality for every relation in production database; (4) Quality can be measured in monetary terms, i.e. the price of 3NF ignoring [1990Han]; (5) Application functionality first, then: (5a) Module cohesiveness for maintainability, economy and new development decreasing costs; then, E. Villar THIRD NORMAL FORM FUNDAMENTALS

(5b) Query performance is a matter of stepwise refinement: Probably the query programmer will develop the first cut and some guessed refinements during the Stress Test phase but the database administrator will continue refining the application performance as part of RM Physical Design duties (please, find r-DBMS performance duties in APPENDICES).

11.14 3NF and DBMS performance "Ideally one would like to have a system where he can specify his data structures at whichever level is most convenient and then have the system pick the best possible implementation" [1971Ear]. Thanks to Relational Model, DBMS performance is not a matter of Logical Design but a matter of Physical Design, DML automatic optimization and Database Tuning of involved tables and programs in the peak hours until arriving to an steady state of 80% of CPU [1981Mos] comsumption in a whole. All according the wellstablished methods of stepwise refinement of every system bottleneck [1989SAG]. Doing our best designning data is our business. And after happy delivery of a new application to the production environment, the system programmer and the DBA also will do their best for continue delivering the agreed level of service of the new system without impairing the rest of systems —be the tables in 1NF, 2NF or 3NF, and be the programs in Gnu MDK, Gnu COBOL, PHP 5.6, HTML5, C++ or Pascal 7.0.

11.15 Occasionally Homer takes a nap The own Codd and three conspicuous advocates of Normalization, i.e. our Homers, like taking the same kind of “nap”. All they sincerely think that database performance can be influenced by the Logical design: ”One of the aims of normalizing a collection of relations is to make the insertions, updates, and deletions clear in meaning and therefore easily understandable. Normalization has little to do with pure retrieval. In fact, normalization usually involves breaking relations into relations of smaller degree (those with fewer columns); This tends to reduce performance on pure retrieval because many more joins must often be executed” [1990Cod]. “In a sense, normalization optimizes update performance, at the expense of retrieval performance. Sometimes, in order to improve retrieval performance, however, it may be desirable to put two facts in one place, or one fact in two places. If you follow normalization strictly, however, you’re not allowed to make tradeoffs of that sort” [1986Dat]. "As might be expected, interactive response slowed down during the execution of very complex SQL statements involving joins of several tables. This performance degradation must be traded off against the advantages of normalization, in which large database tables are broken into smaller parts to avoid redundancy, and then joined back together by the view mechanism or user applications" [1981Cha]. "With respect to performance trade-offs," normalization is "biased toward the assumption that all nonkey fields will be updated frequently. They tend to penalize retrieval, since data which may have been retrievable from one record in an unnormalized design may have to be retrieved from several records in the normalized form. There is no obligation to fully normalize all records when actual performance requirements are taken into account" [1983Ken].

2016-01-2412:51:18 Page 28 of 76

11.16 Normally Homer is Homer The following six statements are more true than the four of previous subsection: ☺ “A base relation, may be fully normalized or not” [1990Cod] because "there is no stipulation that a relational database will be designed to have minimal redundancy, although this is an option that may be chosen" [1979Cod]. ☺ “Normalization was originally conceived as a systematic way (with proper theoretical foundations, of course) of ensuring that a logical design of a relational database would be free from insertion, update, and deletion anomalies. And indeed, designs that are proposed today can be defended on a rational basis!” [1990Cod]; ☺ "Moreover, performance tradeoffs occur not just between retrievals and updates, but between different retrievals too; queries that need to access the narrower normalized tables will be forced to access wider denormalized tables" (F. Pascal, "On Normalisation" in "DataBase Debunking" internet page); ☺ Join materialization under the label of "snapshot" was proposed at 1980 by Adiba & Lindsay [1980A&L], and it were implemented by major commercial r-DBMS’s around the end of XX century as part of data warehousing trends [2000Dat]; ☺ There is not a micro-world trading DBMS performance at the cost of degrading database functionality allowing data lost, data corruption and uncontrolled information delays; ☺ There is no standard DBMS Benchmark being not in Third normal form [1993Gra].

11.17 Normalization trade-offs meme "De tales polvos, tales lodos" Quevedo (1622). The mental map of the meme is "RM Data design can influence the future application performance". But cannot. Explorer>Google>2015-11-23>8:00-8:40 "database normalization" (2.240.000 results 0.34 seconds) + "performance" (About 987.000 results 0.31 seconds) + "trade-offs" (About 2.120.000 results 0.40 seconds)

11.17.1 A sample of normalization meme "Normalization enhances the integrity of the data by minimizing data redundancy and inconsistency. However, there may be additional performance costs for retrieval of data in certain applications. As a result, the physical database design may deviate from the normalized form due to performance considerations". (Craig Borysowich Apr 6, 2007).

12. IRREDUCIBLE UNF DATA MODEL UNF is the acronym of "Unnormal form" which is a synonym of "Unnormalized relation" of Codd [1971Cod] but with a primary key RM compliant [1971CEF]. "Three of the principal kinds of data dependencies which still need to be removed are: Ordering dependence, indexing dependence, and access path dependence" [1970Cod]. Following the English track of E.F. Codd of 1970, 'dependence' term will maintain its association with the coined by Codd data design defects, i.e. "partial dependence", "transitive dependence", etc. Reserving 'dependency' as a neutral or positive adjective as in "functional dependency". This report is on the data dependence between pairs of attributes or pairs of irredundant subsets inside irreducible non-first normal forms. Do not be reluctant with functional dependency as analytical tool also in this foreign area, at DATABASE DEPENDENCY>FD semantics, please find how interpreting "x→y" in a friendly way for you. E. Villar THIRD NORMAL FORM FUNDAMENTALS

The idea is applying Functional Dependency analisys to the prerelational database structures characterizing formaly its data redundancy along the following entries: • • • • • • • • • • • • • •

Storage representation details; Data integrity & Data dependence; Unnormalized relation; Unnormal form; Unnormal form family; Database UNF objects; Unnormal form protocol; Irreducible unnormal form; UNF record uniqueness; UNF DBMS reduces redundancy; UNF database redundancy; RM PK is minimal; UNF PK is RM compliant; and Normal Procedure.

12.1 Storage representation details «The relational model for formatted databases [1970Cod] was conceived ten years ago, primarily as a tool to free users from the frustrations of having to deal with the clutter of storage representation details. This implementation independence is coupled with the power of the algebraic operators on n-ary relations and the open questions concerning dependencies (functional, multivalued, and join)» [1979Cod].

12.2 Data integrity & Data dependence Database integrity is the strategic target of Data integrity field. Dependence is the grammar of reference for formalizing Data redundancy; and definitely, Data redundancy is the descriptive source of Data integrity threats.

12.3 Unnormalized relation "An unnormalized relation is one which is not in first normal form" [1971Cod], [1987Miu]. In this context, 3NF is a subset of 2NF relations, and 2NF is a subset of 1NF relations.

12.3.1 UNF design vs. UNF-DBMS designer Any DBMS product (be it fully RM compliant [1985Cod] or not), can support in production a designed flat record type —the name outside RM world of a relation in 1NF— even a full schema of flat records. For instance, ADABAS since 1974 was considered a relational product for having "relations of assorted degrees" [1986Dat] manipulated by "a low language of boolean selection expressions" [1986Dat] although such relations could be extended with a multiple field or a periodic group, however, the ADABAS User Group of Spain —at 1981 very naturally— recommended 3NF for all database schemata as a direct experience of the author.

12.4 Unnormal form Unnormal form (abbr. UNF) world is a mix of some atomic fields, e.g. a minimal primary key, and some nonatomic field structures. All "fathers" of a hierarchy father-children always has a single value primary key ―that it is irredundant and minimal for being single valued―, whose fields are RM compliant —according the examples of Codd [1971CEF]—. Below, please, see Unnormal form protocol.

12.5 Unnormal form family Unnormal form relation refers to: Disk file physical organizations; Hierarchical structures; CODASYL structures; and Object oriented structures. 2016-01-2412:51:18 Page 29 of 76

12.5.1 Disk file physical organizations Disk file physical organizations includes ISAM, BDAM, etc. of the current Operating Systems more or less wrapped by a wonderful pre-compiler Libraries for compromised freelancers.

12.5.2 Hierarchical structures Hierarchical structures are those of the never ending Legacy DBMS.

12.5.3 CODASYL structures CODASYL structures refers to the standard owner-member structures of CODASYL.

12.5.4 Object oriented structures Object oriented structures refers to the O.M.D.G. structures of [1992Cat] but also to the same concepts with different names of [1991Gra] and [1995SAG]. All they are revamping the pre-relational database world, and, probably forever in living niches supported by SQL-OO [1990Sto], [2000Mis]. 12.6 Database UNF objects A field containing an atomic value of a domain is the seed of the remainder UNF object domains, starting by the afore mentioned UNF primary key.

12.6.1 1997 Database objects The OMDG [1997Cat] Objects are the following: (a) An enumeration is an small series of atomic values; (b) A collection is a finite class of fields, i.e. a vector of homogeneous values; (c) An ordered collection is a collection ordered by the value of its members; (d) A bag is a vector of atomic fields with homogeneous atomic values which is not a set; (e) The order of arrival to the bag of each atomic value is kept for controlling the bag integrity;

We will try to deal with the data integrity risks of a production database designed following customary patterns of a UNF relation of one-level hierarchy, i.e. one 1NF kernel predicate, e.g. CITY, + one [embedded] UNF characteristic predicate, e.g. [DISTRICT:]. DISTRICT is a periodic group, i.e. a set of 3NF predicates (represented here by {DistrictNo, DistrictName}) thanks to its own "child" discriminator, e.g. DistrictNo. For example: CITY (CityCode, CityName, DistrictCount, [DISTRICT:] {DistrictNo, DistrictName}) CityCode is the primary key of CITY relation; and (CityCode, DistrictNo) is the “primary key” of each sub-row of the [DISTRICT:] Periodic group.

12.7.1 UNF flat records are out of protocol We assume that a UNF database schema with relations with only atomic fields inside has the same data integrity troubles that any of our designer dealing with a relational schema. In other words, an UNF schema of flat records is out of the scope of this report.

12.8 Irreducible unnormal form Irreducible unnormal form is an UNF relation whose components (i.e. the father and its children unitary records) are in third normal form as Miura & others point out: "We discuss the strategy to get canonical forms in terms of FDs and MVDs. In this section, we suppose all the [UNF] relation [components] are in 3NF, which are mechanically obtained" [1987Miu]. In order to capture the specific type of redundancy implies by UNF, we will assume that each "record type" of every non-atomic structure IF IT WERE AN ISOLATED RELATION THEN IT WOULD BE IN THIRD NORMAL FORM [1987Miu], [1971CEF].

12.8.1 Original UNF protocol credits It is worth noting that UNF database redundancy were ignored or unmentioned in the RM literature and that UNF redundancy formal analisys has been feasible thanks to the original "irreducible non first normal form relations" [1987Miu] concept of T. Miura, K. Moriya and H. Arisawa. Some other OO-designers as Grh… [1991Grh], Ambler [1998Amb] and Lee [1995Lee] share the concept of having irredundant objects in 3NF. In fact, an example of Ambler follows.

12.8.2

(f) '⋿' is the Z notation for bag membership (Cambria Math MS font); (g) A struct is a persistent structure constructor, formerly, a record type. Struct is a flat enumeration of atomic and nonatomic domains (in any combination); and (h) An array is a persistent structure constructor of a free combination of struct, field, enumeration, collection and bag.

12.6.2 1970 Database objects The database "objects" at 1970 (i.e. "avant la lettre") were the following (please see: [2000Sim]): (1) A multiple-field (abbr. MU) is a bag type whose order of arrival is not maintained automatically and having an extra read-only MU-count field at the beginning[1989SAG]; (2) A periodic group (abbr. PE) is an array type of an struct of several fields and having an extra read-only PE-count field at the beginning[1989SAG].

12.7 Unnormal form protocol Our mental map of UNF relations is a one level hierarchy.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

Class normalization in 3ONF

The following figure "shows the resulting class diagram for Addres in 3ONF. Notice how each class is highly cohesive, encapsulating a single set of related behavior. Also notice how each class is easy to understand quickly, making It easier to maintain and to enhance" [1998Amb].

12.9 UNF record uniqueness In UNF world the logical record starts defining an atomic field seldom called "primary key" that will maintain the uniqueness of the own record on the basis of building up a physical index with some designer explicit facility.

12.9.1 UNF ISN &1NF ROWID Sometimes, UNF Logical record uniqueness is accompanied by adding automatically an initial fixed-length field to the record with an internal sequential number which is unique (abbr. ISN), assigned and maintained by the u-DBMS with a controlled sequential assignment mechanism. This technique is known as ROWID in the RM world [1990Sto] and its fixed length allows to any type of DBMS an economical manipulation of the indexes (and the upper index levels) independent of the variable size of the primary keys, smaller space for database indexes and smooth estimation of index space per relation. 2016-01-2412:51:18 Page 30 of 76

12.10 UNF DBMS reduces redundancy

12.13 First normalization

"An integrated database implies the control of redundant data. In a conventional system of independent files and applications, data redundancy is difficult to avoid" [1975Pal]. UNF DBMS (Hierarchical, CODASYL, Inverted Lists based or OOOriented)reduces to zero all the data redundancy coming for the Application-oriented file redundancy but with their sophisticated structures, they introduce its own type of redundancy that seems transparent to Codd eyes.

(STUB) populate the UNF and the resulting tables; they seem to be in irreducibe 1NF (encapsulated 3NF). "The possibility of eliminating nonsimple domains appears worth investigating, e.g. M. E. Sanko of IBM, San Jose, independently recognized the desirability of eliminating nonsimple domains " . "There is, in fact, a very simple elimination procedure, which we shall call normalization" [1970Cod] and, after 1971, First normalization which is the first step of Database normalization.

12.11 UNF database redundancy

12.13.1 Normalization proceeds as follows.

Although unmentioned by RM literature, UNF database structures are redundant under the performance oriented data design strategy of "The records used together, go together" that means that the subordinate characteristic rows of any kernel-father go with its father in the same logical record in the form of a variable size group of children sub-records, a member rings or a periodic group.

«Consider, for example, the collection of relations exhibited in Figure 3 (a): jobhistory and children are nonsimple domains of the relation employee. salaryhistory is a nonsimple domain of the relation jobhistory» [1970Cod]. «The tree in Figure 3(a) shows just these interrelationships of the nonsimple domains.

12.11.1 UNF generic protocol An UNF relation is a relation of variable rows whose first level is a flat structure, e.g. FATHER (F, A₁ ...) and a set of members, being each child a fixed substructure, e.g. CHILD [1] {Ch, B₁ ...}. Father part is a fixed set of fields and in a given row father part is mandatory but the child set can be empty. As stated, FATHER fixed part predicate is in 3NF. And so are each of the predicate of each current children.

12.11.2 UNF partial dependences However, the whole UNF variable logical record suffers partial dependence: ⇨ F functionally determines all FATHER.Aₓ fields; ⇨ (F·Ch) functionally determines all CHILD.Bₓ fields; The last two statements together is a partial dependence pattern per PE-GROUP occurrence, i.e. considering {FATHER+(n)Periodic GROUP} a flat record, we will found a variable number of partial dependences per logical record.

12.11.3 UNF-DBMS FILE redundancy The picture is that, given a FATHER RECORD (together its PE GROUP), any field value of any CHILD substructure is infallibly repeated inside several CHILD substructures of the remainder FATHER records. Even any given CHILD field value can be found in the same PE GROUP several times. The data redundancy coming from the Application-oriented files "is difficult to avoid" [1975Pal]. But, in the Legacy DBMS structures, i.e. hierarchies of one level, owner-member CODASYL structure and HEADER-(n)PE-GROUP, the data redundancy is voluntary, according the mixture of Logical & Physical design, such design, probably, influences positively the query application performance although penalizing UNF relation maintenance. All so perfectly stablished and assumed that were transparent (even for the RM pioneers).

12.12 UNF PK is a minimal composition The "normalization" [1970Cod] —i.e. the next Normal procedure — is "applicable" [1970Cod] to "unnormalized collection of relations" [1970Cod] that "satisfy the following conditions" [1970Cod]: (1) "The graph of inter-relationships of the no simple domains is a collection of trees" [1970Cod]; (2) "No primary key has a component domain which is no simple" [1970Cod], i.e. UNF PK is an RM PK. "The writer knows of no application which would require any relaxation of these conditions” [1970Cod]. Therefore, the UNF primary key is 1NF compliant, i.e. it is minimal when being simple and irredundant and minimal in composite cases. E. Villar THIRD NORMAL FORM FUNDAMENTALS

FIG. 3(a) Unnormalized set employee (man#, name, birthdate, jobhistory, children); jobhistory (jobdate, title, salaryhistory); salaryhistory (salarydate, salary); children (childname, birthyear)» [1970Cod]. «Starting with the relation at the top of the tree, take its primary key and expand each of the immediately subordinate relations by inserting this primary key domain or domain combination. The primary key of each expanded relation consists of the primary key before expansion augmented by the primary key copied down from the parent relation. Now, strike out from the parent relation all nonsimple domains; Remove the top node of the tree; and Repeat the same sequence of operations on each remaining subtree. employee (man#, name, birthdate) jobhistory (man#, jobdate, title) salaryhistory (man#, jobdate, salarydate, salary) children (man#, childname, birthyear) FIG. 3(b) Normalized set» [1970Cod]. «The result of normalizing the collection of relations in Figure 3(a) is the collection in Figure 3(b). The primary key of each relation is italicized to show how such keys are expanded by the normalization» [1970Cod].

12.14 RM PK is a minimal composition All 1NF relations have an irredundant and minimal primary key as part of RM compliance, in such a way that "PK 1NF compliant" and "PK RM compliant" are synonyms. {UNIQUENESS}{MINIMAL}{NONREDUNDANT} But not ony that, the primary keys coming from the UNF world are already sinple or a composition of simple domains, therefore the RM world inherits from UNF world a primary key concept of PK uniqueness that includes infalibly its minimality in case of composition and, consequently, its own irredundancy.

13. ZERO NORMAL FORMS Zero normal form (abbr. 0NF) is a label involving the design defects that must be solved before passing from a First normal form to a a Third normal form. 2016-01-2412:51:18 Page 31 of 76

In other words, this report displays known defects that must be repaired before designning directly a table without partial and without transitive dependence. Solving any of them is a matter of different methods being part of the ideal library ActiveFirstNormalForm. ZERO NORMAL FORMS report includes the following sections: • Zero normal form; • R name includes a database value; • Inflated primary key; • PK component has not a genuine value; • Vector of attributes; • Range of attributes; • Horizontal relation; and • Catalog of design pitfalls.

13.1 Zero normal form Zero normal form (abbr. 0NF) is a label covering the no-man land between a true Unnormalized relation (aka Unnormal form, abbreviated UNF) and the First normal form. A Zero normal form relation is not UNF compliant because it is without UNF structures, e.g. a periodic group, but it is neither 1NF compliant because having some design defect incompatible with a direct normalization.

13.1.1 Zero normal form protocol Besides, a Zero normal form is a relation R having only one 0NF defect type perhaps with more than one instance of the same defect. More than one type of 0NF defects in the same R is out of "bona fide" data design protocols.

13.2 R name includes a database value «One 'item' table containing 12 individually named months» [1983Dat]. INSTEAD OF «An 'item month' table where the 12 month values will be occurrences of the 'Month' attribute» [1983Dat]. The example can be generalized. The defective R in 0NF is a "wild-distribution" of the ideal table R having a data value as part of the development database directory names: R¹, R², etc. The corresponding recipe of ideal library is the following: (Step 1) Add a new attribute, e.g. Month, to every R¹, R², etc. table; (Step 2) Populate every R¹, R², etc. with a constant value coming from its own table name; (Step 3) Create R identical to one of current Rⁱ; (Step 4) Unite all the rows of R¹, R², etc. on R; (Step 5) Simplify the documentation changing R¹, R², etc. by R. Now, R is ready for normalization, i.e. in 1NF.

13.3 Inflated primary key This section has the following subsections: o UNF redundant ISAM index; o Designated primary key; o RM designated keys; o Designated key; o Inflated candidate key; o No minimal candidate key; and o Redundant candidate key.

13.3.1 UNF redundant ISAM index An UNF designated "primary key" (correlated with the immediate physical design of the corresponding index) can be redundant as it can be appreciated in the following real-life "black pearl". «For example, assume that in the ISAM file AUTHOR, FirstName and LastName are stored in two separate fields of each record. One option would be to define the AUTHOR index as a single field key consisting of the LastName field. This would cause the AUTHOR index to order the records by LastName. E. Villar THIRD NORMAL FORM FUNDAMENTALS

A second option would be to define the AUTHOR index as a compound key consisting of the LastName field followed by the FirstName field. This would cause any records with the same LastName to be further ordered by FirstName. The first option would require less storage space since the FirstName would not be stored in the index. However, the second option would provide somewhat faster access times, if there were a large number of AUTHOR records with the same LastName. With the addition of the FirstName field, each key would point closer to the desired record, which would result in less sequential searching» [1985Mix]. Note that none of the two physical PK's is unique. The Physical & Logical Design mix also is an undesirable UNF dependence.

13.3.2 Designated primary key Unhappily, during Logical design a designated PK is not under the mentioned automatic RM compliance umbrella. EVERY BODY CAN DESIGNATE: i. A REDUNDANT UNIQUE INDEX (UNF world); ii. AN IRREDUNDANT BUT NON-MINIMAL PK (both worlds); iii. A REDUNDANT PK (both worlds);

13.3.3 RM candidate keys "A candidate key of a relation R* is a subset K of R such that for any distinct tuples t₁ and t₂ in R*, t₁ (K) ≠ t₂ (K) and no proper subset K′ of K shares this property" [1983Mai].

13.3.4 RM designated keys "The keys explicitly listed with a relation scheme are called designated keys" [1983Mai]: 1) A designated primary key of R may match a candidate key; 2) A designated primary key of R may contain a candidate key [1983Mai]; even 3) A designated “primary key” of R can be a subset of a candidate key, i.e. such designated “primary key” is a pseudo primary key of R.

13.3.5 Inflated candidate key The case 2 of previous subsection is an inflated candidate key. Such a type of designated primary is formally a redundant primary key by the mean of specifying the primary key of the table as a composition of the known primary key plus some other attribute intending a boost in the performance of some known queries.

13.3.6 No minimal candidate key A no minimal CK case occurs if the real CK is accompanied by an attribute which is independent of the current components of the candidate key.

13.3.7 Redundant candidate key A designated composite candidate key is redundant if (at least) one attribute y of the designated CK internally depends on a part x of the CK, i.e. x→y, being both x and y disjoint parts of the specified candidate key. The design defect of a relation having an inflated candidate key is the "Internal dependence". For having a table R in 1NF, its designated primary key must be minimal that implies its irredundancy, in case of being a composite PK.

13.4 PK component has not a genuine value A PK component having not a genuine value means that such attribute IS NOT NULL WITH DEFAULT —in SQL terms—, which is the exact definition of this zero normal form, for instance, R (EMPLOYEE,∊E, SKILL∊S, LANGUAGE∊L) the following Kent's [1983Ken] original source. R (EMPLOYEE∊E IS NOT NULL, SKILL∊S IS NOT NULL WITH DEFAULT '␢', LANGUAGE∊L IS NOT NULL WITH DEFAULT '␢'); INSTEAD OF 2016-01-2412:51:18 Page 32 of 76

R (EMPLOYEE∊E IS NOT NULL, SKILL∊S IS NOT NULL, LANGUAGE∊L IS NOT NULL).

13.4.1 Alternate design with genuine PK values If some AK existed in R, there is another design way to get genuine values in PK components, i.e. the designer select another AK (being IS NOT NULL) as primary key.

13.4.2

Genuine PK values with a surrogate

Even if some AK did existed in R, as in this case, switching to another PK is possible just adding a surrogate key to R, i.e. R (EMPLOYEE∊E IS NOT NULL, SKILL∊S IS NOT NULL WITH DEFAULT '␢', LANGUAGE∊L IS NOT NULL WITH DEFAULT '␢'); INSTEAD OF R (SK_R IS NOT NULL, EMPLOYEE∊E IS NOT NULL, SKILL∊S IS NOT NULL WITH DEFAULT '␢', LANGUAGE∊L IS NOT NULL WITH DEFAULT '␢').

13.6.3 Interval datatype for temporal data The original Darwen's paper [2003Dar] together the Date elaboration in a chapter of the book [2000Dat] are the RM reference for a future SQL INTERVAL datatype. For the interested and curious readers, let copy the corresponding summary: 2. Temporal data 4. Intervals 5. Interval types 6. Scalar operators on interval 7. Aggregate operators on intervals 8. Relational operators involving intervals 9. Constraints involving intervals 10. Update operators involving intervals 11. Database design considerations

13.7 Horizontal relation An horizontal table by Codd [1990Cod] follows.

13.5 Vector of attributes A vector of attributes [1986Dat], e.g. "Extension1, Extension2, Extension3", appears in the table ROOM of the following relational schema S. S: {ROOM (RoomId, BuildingId, FloorNo, Extension1, Extension2, Extension3); EMPLOYEE (EmpNo, Name, FirstName, …); EMP_EXT (EmpNo, Extension#)}; INSTEAD OF S: {ROOM (RoomId, BuildingId, FloorNo), EXT (Extension#, RoomId∊ROOM); EMPLOYEE (EmpNo, Name, FirstName, Extension∊EXT)};

13.7.1 Simple horizontal relation Let us use an example of a shorter horizontal table which is the table COLOR (Color1, Color2, Color3). This first table COLOR is an horizontal relation [1990Cod] having three columns (the PK and two CK's) and a body of two rows. COLOR

13.6 Range of attributes A range of two attributes represents horizontally a variable set of values in a very compact form if compared with the RM way that normally would be part of the vertical cells of the same column in an standard table. For example, R (EmpNo, ProjectId, FromDate, ToDate) INSTEAD OF R (EmpNo, ProjectId, Date).

13.6.1 First-cut in favor of range approach For example, R (EmpNo, ProjectId, FromDate, ToDate) can represent in one row a sequence of n correlated days of a programmer in a given customer project. R (EmpNo, ProjectId, Date) would need n standard rows having the same information and n correlated Date values (a part of repeating the same Empid-ProjectId value n times .

13.6.2 RM Mongolian hordes Evidently, being the intervals statistically short, the Range approach loose the disk space economy but in case of datawarehouses, the Range approach has its advocates. However, the problem today is not the disk space but the performance of the any types of queries, e.g. the grouped queries with the CUBE facility. Currently, all the Query optimizers are RM oriented that means that —letting a part the in-house development of Range approach for transforming set of rows in an horizontal row, for loading, for indexing and for horizontal query optimization (overall on the implicit values of a given range)— thanks to query materialization [1980A&L], the current DW suites display an strong DW mastering and performance (please, see: [1996Inm], [1998IBM] & [2011Mic]).

E. Villar THIRD NORMAL FORM FUNDAMENTALS

Color1

Color2

Color3

Black Red

Blue

Green

Yellow White The following table COLOR is an standard thesaurus of colors of 3NF class A with the same information but presented in RM fashion, i.e. one column, the PK and a body of six rows. COLOR

Color Black Blue Green Red Yellow White

13.7.2 Horizontal relation This second horizontal instance of table COLOR has three functional rows per RM tuple and several alternate keys.

2016-01-2412:51:18 Page 33 of 76

The concise remedy to the second example of horizontal tables is a COLOR relation in 3NF of class A with a tuple matching each business row and with just the prosaic primary key and a nonprosaic datatype in the Color attribute.

13.7.3 Sophisticated horizontal relation (1) There are two other variates of horizontal tables —coming from the writer experience. The COLOR table that follows is of three business row per RM tuple.

The remedy of this sophisticated variate is left to the reader.

13.7.4 Sophisticated horizontal relation (2) In the second sophisticated case —that follows, a part of three business row per RM tuple, the horizontal designer has introduced a functional primary key (called Prefix) for every horizontal tuple.

The immediate remedy of this case is the following table (with the same name) that is with a relational tuple per business row (as expected).

CUBIK method is the converse operation of CROSSTAB. Cubik normal form recovers the ideal grouped by CUBE relation from an horizontal relation.

13.8 Catalog of design pitfalls Probably this catalog of pitfalls is too detailed but the novel designer can appreciate be aware of some of them, specialy that of reusing a current primary key column for more than one type of functional entity saving a table of the database schema. The list of design pitfalls is the following: Constant column; Multiunit column; Enumerated attribute Microattribute; Metarow domain; Metadata row; Edited attribute; Hidden computed column; Multidomain column; Hidden UNF structure; and Hidden derived table.

13.8.1 Constant column A constant value in a column, e.g. LegQuantity in the table HORSE, must be discarded immediately for being dependent of every other column of the HORSE table. This type of generic information has its position in a more general table that possibly can existed in another schema of the shop of our example.

13.8.2 Multiunit column The primary key of this new table COLOR is coming from the reality and it is having only one alternate key. The invented PK, i.e. Prefix, was an unnecesary noise and it returns to be the prefix of vertical COLOR.Code which is the PK in reality.

13.7.5 Cubik normal form "The cube operator generalizes the histogram, cross-tabulation, rool-up, drill-down and sub-totals constructs. Creating the CUBE requires generating the power set of the aggregation column" Jim Gray (1996) [1996Gra]. This Jim Gray (1944-2007) is a luxury. The exchange of dimensional variables with non-dimensional with a click is known as CROSSTAB operation [2011Mic]. CROSSTAB not only rotates -90º (!) the original grouped-with-CUBE table but generates the important nonexistent crosses, vertical and horizontal subtotals and the grand total.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

There is no way of documenting the unit of meassure (abbr. UOM) of quantity columns in SQL databases —as already stated speaking of Datatypes—, then the pitfall is the following. Admitting several explicit UOM's in the same column, for instance, 1024K, 1048576, 10242, 1M, etc. as values of the R.ProgramSize column. INSTEAD OF Having a documented UOM of every quantitative database column, for instance, "The implicit ProgramSize value is in Kilobytes, i.e. "R.ProgramId(IEFBR14), R.ProgramSize(1)" data row means that "IEFBR14 program is of 1K". There is the case of 1NF columns as floating-point numbers having a literal whose bit configuration goes in a normalized form, e.g. +37589333333333.3315E1, then using the corresponding SQL datatypes (if any) or the program language facilities otherwise for having these types normalized is mandatory.

2016-01-2412:51:18 Page 34 of 76

13.8.3 Enumerated attribute Several atomic values in a prosaic column cell, e.g. items separated by or by in a CHAR column but expecting an homogeneous value in every cell. An example of this pitfall [1983Ken] follows.

INSTEAD OF Using TEXT or VARCHAR SQL data types for enumerations and other textual values. In the previous case, "the same person appears to be living at two different addresses again precluding a functional dependency" [1983Ken], i.e. PERSON→ADDRESS.

13.8.4 Micro-attribute One micro attribute is meaningless, e.g. R (… Year, Month, Day, …).. INSTEAD OF Merging some specific micro-attributes, a meaning appears, e.g. R (… Date, …).

13.8.5 Metarow domain For example, in R (RowId, RowLength, RowType, CityName, …): RowId, RowLength and RowType ought to be invalid column names at SQL definition. This fault extends also to the column names and to the corresponding domains (even with different names).

13.8.6 Metadata row 'Metadata row' refers to a table with some columns explicitly mixing an attribute name and its value in the same row, for example, R (SeqId, Operation, TableName, AttributeName, Value) breaks the rule for having TableName, AttributeName and Value as attribute names.

13.8.7 Edited attribute For example, assuming that our standard date value were yyyymmdd, any any other date value format is out of protocol and must be changed.

13.8.8 Hidden computed column Computed columns are welcome in 1NF provides they were explicitly defined. Item value determined by a hidden value expression using other item(s) of same row are not part of 1NF protocol.

13.8.9 Multidomain column The designer knows that some range of values of a column has a domain and the remainder values are of another domain [1986Dat].

13.8.10 Hidden UNF structure A CHAR attribute whose value includes Ascii/EBCDIC control marks, for instance, , must be defined as TEXT or VARCHAR.

13.8.11 Hidden derived table The set of rows of R is a Cartesian product of other tables of the current S schema. R is a view and it must be redefined so.

14. DESIGNER RELATION The DESIGNER RELATION report focuses on the aspects of a relation that influences the quality of data design. Designer oriented relation chapter divides the flat set of attributes in a small structure of subsets with different functional roles and properties. This report continues the description of the base relation, now from the point of view of the main design buttons conducting more easily towards the cohesion and the irredundancy of deliverables. And looking at the designer rear-view mirror, the partial dependence and the transitive dependence. E. Villar THIRD NORMAL FORM FUNDAMENTALS

This report has the following entries: • The normal form; • New relation structures; • The three 1NF concepts; • First normal form; • Primary key (PK); • Predicating part (DAß); • Candidate key (CK); • Primary domain; • Alternate key vanishment; • AK descriptive fashion; • Minor-key domain (ĸ); • Nonkey domain (∼ĸ); and • Primary domain transparency; • Alternate key vanishment; • AK descriptive fashion; • Minor-key domain (ĸ); • Nonkey domain (∼ĸ); • Nonkey domain is reducible; • Codepending vs. denoting;

14.1 New relation structures Following E. F. Codd in [1971Cod], this second part of BASE RELATION report focuses on: − The structural concepts that functional dependency manages, specially between the graph coming from observing set of rows and the FD types; and − The different attribute subsets being the protagonists of a canonical relation. The two Logic subsets of R table are the following: − The predicate subject or PK − The predicating part or DAß The structural partitions of R table are the following: − Primary key; − Nonkey domain; − Minor-key domain (or AK attribute subset). Database predicate structure The Logic structure of R starts dividing the set of attributes of the Pxy predicate in two parts: The primary key (x), i.e. the subject; The remainder attributes (y), i.e. the predicating part.

14.2 The three PK aspects of 1NF A part of being the pristine normal form [1970Codd] and from the point of view of its kind of primary key, 1NF encloses three folds: − The 1NF PK is ignored as part of normalization process; − The 1NF PK is minimal as 2NF fulcrum; and − The 1NF PK is minimal with PD & TD defects.

14.2.1 1NF PK ignored An unchecked R but ready to be normalized is in first normal form . whose designated primary key (if any) is ignored. In this case, 1NF activates the spare candidate key —the full prosaic row— and it is the primary key respected until the first candidate key is found.

14.2.2 1NF minimal PK as 2NF fulcrum The start definition of 2NF: "A relation R is in second normal form if R is in 1NF, and every attribute of R depends on the full primary key" implies the following. Corollary: The full primary key of R in 1NF is irredundant and minimal.

14.2.3 1NF minimal PK with PD & TD A table R of production database whose owner reject the normalization knowing that R has partial dependence and transitive dependence. 2016-01-2412:51:18 Page 35 of 76

In this case, 1NF is taken at the lowest level in the scale of normalization as can be appreciate in the following cladogram [2015WCL] of Figure 6, considering "Partial Dependency (PD)" and "Transitive Dependency (TD)" as two independent characteristics of any table.

Let Κ (uppercase Kappa) be the variable name of the primary domain of R in the definition that follows. Κ ≝ {CK¹ ∪ CK² ∪ CK³ ...} Bernstein names "prime attribute" [1976Ber] to each element of Κ. "Key field" is the name given by Kent [1983Ken] to each element of primary domain. Primary domain (being the source of alternate keys) is a stepping stone for defining minor-key domain .

14.8 Primary domain transparency During normalization the status of any CK component can be promoted to foreign key and this is the only possible change in the primary domain of R. This is so because being R in 1NF all composite CK's are in its minimal steady state at start of normalization (and so are at the end). Figure 5. Cladogram of 3NF, 2NF, 1NF

14.3 First normal form We follow the positive definition of Codd of 1970 [1970Cod] and we assume the 2NF fulcrum role of First normal form (abbr. 1NF). "A relation R is in First normal form, if all its domains are simple i.e. domains whose elements are" [1971Cod] indivisible values, and all its candidate keys are of a minimal number of components being them of not-null values. "R in 1NF" and "R is RM compliant" are synonyms.

14.3.1 InternalNF procedure InternalNF proc(a) FirstNFproc checks/repairs Zero normal forms and the other inflated candidate keys (if any). FirstNF is the only normalization method that affects to candidate keys but under the condition that such candidate keys were designated CK's (not a result of the own functional dependency analisys of R tabulation).

14.4 Primary key (PK) The primary key is the Logic subject that identifies each row in the R set. It is the only mandatory part of the normal structure. In case of being a multiattribute, the PK always is irredundant and minimal. And PK cannot be in another way by its uniqueness on simple domains.

14.5 Predicating part (DAß) Let call Descriptive Attribute Subset (abbr. DAß) of R to the predicating part. The DAß includes all the attributes of R except the PK attributes. The role of each remainder attribute of the primary key is cumulative describing its own identifier by sorting it with the corresponding value. Each attribute assigns the subject to a Logical field, e.g. "red" value in CarColor assigns a given CAR to the abstract RED field. The Logic definition of the predicating part of R, i.e. the DAß, is as follows. DAß≝ {{R} MINUS {PΚ}} The DAß can be empty.

14.6 Candidate key (CK) Each candidate key (abbr. CK) is a non-empty and irredundant subset of attributes guaranteeing the uniqueness of every row of R. «For each relation R in a data base, one of its candidate keys is arbitrarily designated as the primary key of R» [1971Cod]. But it does not mean that the designer must to know all the candidate keys before the PK designation. Majority of the times, we can expect that candidate key is a single attribute. In this case, such CK is irredundant and minimal.

14.7 Primary domain Primary domain (see: [1979Cod]) is the name of the set of attributes being CK or part of a CK. One of CK's is the PK (as already stated). Therefore, this domain cannot be empty because the primary key is a candidate key. E. Villar THIRD NORMAL FORM FUNDAMENTALS

14.9 Alternate key vanishment «In the relational model the term "key" is normally qualified by the adjectives "candidate", "primary" and "foreign," and each of these phrases has a precisely defined meaning» [1990Cod]. The previous statement complement the already known Codd & Date's idea "One and only one primary key" [1990Cod]. It is clear that, after primary key selection, Codd wants that alternate keys were conspicuous by their absence in the skybox of R stadium.

14.9.1 No room for alternate key specs After looking for all candidate keys, the target was having the primary key of R, i.e. the subject in reality of the descriptive predicate of R. It is, there is no room for alternate key specs of R.

14.9.2 Occam razor's instance Without going into the matter, alternate keys without designating them seems part of parsimony principle: If the own micro-world supports the uniqueness of alternate key in R, why waste design time, disk space for the indexes, etc. for a thing that we have already for free?

14.9.3 PK-FK-only data model The OntoNotes project collect complex inter-dependent annotations and store them in a standard relational database. Their database captures both the inter- and intra-layer dependencies. They also capture the dependencies between the different annotations. They distribute them as independent pieces and leave the task of assembling them to the end user. The research challenge is modeling such multilayer annotations because having complex, cross-layer dependencies. The direct solution has been defining the dependencies through the use of Primary and Foreign keys only over 3NF database tables. They consider that a database schema of 3NF tables equiped just with primary keys and foreign keys, is a direct reflex of the complicated world of the dependencies that they are managing (see: [1993]). The following sections explain all the features of the Alternate keys.

14.10 AK descriptive fashion The vanishment of alternate keys as part of explicit constraints of R does not attenuate the functionality of the set of attributes being an AK or part of it. Just the opposite, each alternate key (abbr. AK) is a perfect citizen of the Descriptive attribute subset. Inside the DAß, each alternate key denotes the PK, i.e. it "describes" the PK increasing its current uniqueness. A database denotation is a Logic definite denotation, i.e. the denotation can be composite but it is unique. For instance, "The Kid" denotes "Billy"; or "The Aufbau" denotes the book whose title is "The Logical structure of the World".

14.10.1 Denoting & sorting Each composite AK describes the PK denoting it and each of its component sorts the PK in the standard way. 2016-01-2412:51:18 Page 36 of 76

14.11 Minor-key domain (ĸ) (STUB) Let define minor-key domain (ĸ, i.e. Greek small kappa of "ĸλειδί", i.e. "key"), as the set of attributes of DAß being part of some AK, by the mean of the difference between the set of attributes of the candidate keys, i.e. Κ, and the set of attributes of the PK, i.e. {PΚ}. ĸ: {Κ MINUS {PΚ}} The minor-key domain (ĸ) can be empty. ĸ domain is where alternate keys "vanish" after primary key designation.

14.11.1 On AK as business constraint A composite AK with none attribute being FK represents a business constraint (.98). For instance, given: FLIGHT (Flight#, Time, Destination); where FLIGHT: {Flight#→Time; Flight#→Destination; (Time·Destination)→Flight#}; Comments: Flight# is the PK; (Time·Destination) is an AK that seems to say: "In this airport, every day there is not more than one Flight with the same leaveTime and the same Destination."

14.11.2 On AK as one-to-one IF the single attribute P is AK in R AND P references S THEN P represents a one-to-one relationship between R and S (.98) END-IF. We can expect that S cardinality is bigger than R cardinality (.7). Note that in this 1:1 ERA relationship the important role of P is being a foreign key, and the AK role is supervened for being the only way ofrepresenting the one-one association.

14.11.3 On pure single AK IF the single attribute P is AK in R THEN P is a pure denotative descriptor of the PK (.99) END-IF.

14.12 Nonkey domain (∼ĸ) Let define nonkey domain (∼ĸ) as the set of attributes of DAß being not part of some AK, by the mean of the difference between the set of attributes of the DAß and the set of attributes of the minor-key domain {ĸ}. ∼ĸ≝ {DAß MINUS ĸ} [1971Cod]

The nonkey subset can be empty. The nonkey domain when there are more than one attribute is the protagonist of transitive dependence defect.

14.13 Nonkey domain is reducible Only the nonkey domain of R —the subset of attributes of R being not part of some candidate key—, is reducible. The nonkey subset can be reduced: (1) From redundant to less redundant; (2) From redundant to irredundant; (3) From irredundant to minimal;

14.13.1 Minimal nonkey subset of R A nonkey is minimal if there is none attribute partially dependent on some candidate key and there is none attribute transitively dependent on the primary key. In other words, nonkey domain is minimal if it is a miβ and none attribute depends on part of a candidate key. In all the three previous cases, not only the nonkey domain increases its cohesion, but the overall cohesion of R. For instance, after the last reduction, i.e. when nonkey domain is minimal, R is already in third normal form.

14.13.2 Frozen part of nonkey domain Every foreign key of a polyadic nonkey domain is persistent. If every nonkey attribute were a foreign key then the nonkey domain were frozen and, consequently, already free of partial and transitive dependence. E. Villar THIRD NORMAL FORM FUNDAMENTALS

14.14 Codepending vs. denoting Going into the substance of the of status of alternate keys (if any), the following statements may clarify the PK-AK semantic space. The key point is the fith. i. Each nonkey attribute immediately depends on the primary key; ii. Each nonkey attribute immediately depends on each alternate key but this information is not primitive, it can be recovered from the primitive FD structure; iii. Each part key attribute depends on any of the remainder candidate keys, this information is enclosed in the minor-key domain of R, in order to not jeopardize the internal structure of the nonkey domain; iv. The members of each pair (AKi, AKj) are immediately dependent each other but it can be reduced from the dependency structure without information loos; v. The members of each ordered pair ⟨PK, AK⟩ are immediately dependent each other; but such codependence is asymmetrical from a semantical point of view, because each alternate key denotes the primary key but the primary key does not denote any alternate key.

15. DATABASE NORMALIZATION "Further operations of a normalizing kind are possible" [1970Cod]. DATABASE NORMALIZATION report is a catalog of the useful normal forms for a data designer. The method of exposition is defining first the key concepts useful in each normal form definition. For example, "Transitively dependent" goes before the Third normal form. DATABASE NORMALIZATION report has the following sections: • Partial dependence; • Weak partial dependence; • Fully dependent; • Second normal form; • R dependency structure; • Primitive dependency; • Transitively dependent; • Weak transitive redundancy; • Intransitively dependent; • Determinator axiom; • Adjacent dependent node; • Immediately dependent and adjacent; • Third normal form; • A relation is in 3NF if it is in 2NF • A 3NF relation is also in 2NF; and • Minimalist 3NF formula.

15.1 Partial dependence A 1NF table R is partially dependent, if some nonkey attribute functionally depends on an attribute which is part of a candidate key. Given x≠y, y≠z, z≠x attributes of R; let x∊CK, z∊CK: ∃y∊{R MINUS {CK}}: x→y In other words, there is some nonkey attribute being dependent on some determinant being part of primary domain. The dependent nonkey attributes are redundant attributes.

15.2 Weak partial dependence "Weak redundancy" [1970Cod] & [1979Cod] design for partial dependence is a type of controlled redundancy that recovers from a central repository the unintentional data lost performed by delete anomaly. For instance, recovering the loosed information of some CITY information after deleting the unwanted SUPPLIER with the last information of the mentioned city.

2016-01-2412:51:18 Page 37 of 76

Weak redundancy deals manually with all the database deletion anomalies but certainly knowing the internals of this risky business. Strong T1 (SupplierNr, PartNr, City) redundancy WR: Weak {T2(SupplierNr, City); redundancy T1 (SupplierNr∊T2, PartNr, City)}

15.2.1 Weak redundancy coexists with PD By definition, weak redundancy (abbr. WR) always coexists with the "strong redundancy" [1979Cod]. In its turn, "strong redundancy" is just a label for the data redundancy coming from partial dependence or transitive dependence (or both). Following our protocol, a WR schema only suffers the data redundancy coming from partial dependence, i.e. the insertion and the update anomalies but not the deletion anomaly because after data lost, the information is recovered.

15.3 Fully dependent A 1NF table is fully dependent, if every nonkey attribute functionally depends on the full value of every CK but not on any part of it. Given x≠y, y≠z, z≠x attributes of R; let x∊CK, z∊CK: ∄y∊{R MINUS {CK}}: x→y

15.4 Second normal form

Having a dependency tree structure, all the refactoring into substructures could be performed with parallel processes. Despite the name of partial key defect, SecondNF does not reduce any candidate key of R. The nonkey domain after SecondNF the redundant attributes (if any) are dependent on another nonkey attribute —which is a transitive pattern defect.

15.5 R dependency structure Given a Fair tabulation its dependency analysis gives a Database structure [1974Arm] of functional dependencies [1971Cod]. Given a Dependency structure, its Dependency algebra [2009Mad] gives a Condensed structure [1991Sta] with the form of a dependency Tree [1965Har]. Given any dependency Tree, factoring out its deep sentences [1971Cod] delivers a non-empty Set of irredundant, cohesive and complete 3NF tables [1971Cod]. In order to visualize the difference between partial dependence, transitive dependence, primitive dependence and immediate dependence, let draw a complex digraph of R ((N·S·P), N, E, D, J, M, P, V, F, T, L, A) and then the FD specs (R dependency specs subsection) below.

"A relation R is in second normal form if R is in 1NF, and each attribute of R depends on the full primary key but not on any subset of it" [1971Cod]. Second normal form abbreviates 2NF. "R is in 1NF" means that R ―before the 2NF check― already has a minimal primary key. In [1971Cod] paper, current Second normal form was called Optimal second normal form.

15.4.1 2NF express The designer may encounter 2NF tables just reading the 1NF table attributes. There are two cases in which R is in 2NF "without more investigations" [1971Cod]: P1. All the candidate keys are simple, e.g. R (P, L, J); P2. The nonkey domain is empty, e.g. R (P, L, J). It is necessary reading the 2NF express entry, i.e. "All the candidate keys are simple", for interpreting that Codd defines 2NF thinking in the candidate keys but writing "primary key": Partial dependence design defect can occurs with any composite candidate key (including the primary key).

15.4.2 2NF formal definition A relation R is in second normal form if R is in 1NF, and each attribute of R fully depends on each candidate key but not on any subset of it. "R is in 1NF" means that R ―before the 2NF check― has a nonempty set of minimal candidate keys. Obviously, 2NF express content does not vary.

15.4.3 SecondNF procedure The SecondNF procedure (abbr. SNF proc) checks if there is some non-key attribute depends on some part of a candidate key, i.e. redundant attribute always is a nonkey attribute. The determinant of a partially dependent substructure remains in R horned now with elementOf. The dependent part migrates from the nonkey subset of R towards the new nonkey domain of its determinant. Any substructure free of partial dependence is closed, i.e. a transitive dependence or another partial dependence does not change any previous substructure. E. Villar THIRD NORMAL FORM FUNDAMENTALS

Legend 1: Attribute name is in capital letter, for instance, (N·S·P) ―which is the PK of R― also is the determinator, i.e. the node that is not dependent on any other node. It is the source point of the Treewith-one-source (the exact type of digraph of FD structure). This type of tree is a set of nodes connected by boldfacedlines in green and in red colors. The remainder lines of the picture are derivable dependency lines by the different axioms and they are not part of the dependency tree. Each determinant is a carrier [green] node represented by a letter inside a circle in the digraph; Each nonkeydomain is a sink [grey] node represented by a black letter (without circle); Both the determinator and each node can have, at most, one sink [grey] node; Finally, a codeterminantis a sink [red] node hanging from its adjacent node, i.e. (E↔(N); The partner of a codeterminant node always is a carrier node or the determinator node;

2016-01-2412:51:18 Page 38 of 76

-

Several codeterminant nodes can hang from the same node, forming a family of codeterminants.

Transitive dependence is a property of a given dependency structure of more than one level.

15.6 Primitive dependency

15.8 Weak transitive redundancy

Each dependency brick enters in the primitive set of FD's by the mean of an empirical observation. The set of primitive FD's is the core of dependency structure. It cannot be deduced from the remainder FD's but, having the deep dependency sentences, the remainder FD's can be known axiomatically. In the picture, the boldfaced lines are the primitive FD set. From the above dependency digraph now comes the primitive dependency specs that follows. R: { E↔(N); J↢(N)⊊(N·S·P)⊋(P)↣(F)↣(T)↣A; (1) (P)↣V; (F)↣L}

"Weak redundancy" designers know and manage the risks of transitive dependence of table C then it has a 3NF table D with a fresh copy of the information that table C could lost. After a data lost case in table C, an ad-hoc procedure restores the lost information taking advantage of table D information. For instance, the mentioned procedure will recover the lost information of some district name after deleting the last inhabitant of the remodeled district. Strong CITIZEN (SSN, Name, DistrictCode, DistrictName) redundancy WR: {DISTRICT (DistrictCode, DistrictName); Weak CITIZEN (SSN, Name, DistrictCode∊DISTRICT, redundancy DistrictName)}; After normalizing CITIZEN table, the normalizer saves the creation of a new referenced table, e.g. DISTRICT, just adding it some new column (if any).

(1) A double ℱ3 Armstrong axiom presides the structure; (2) In the center of the same line is the determinator, i.e. the original PK, underlined and blue color; (3) The nodes go enclosed in parentheses; (4) Determines lines in green, codetermines in red.

15.6.1 Intransitive deep sentences of R Having a primitive set of FD's, each intransitive substructure claims its mathematical autonomy. (S, N∈RN, P∈RP); (N, E, D∈RD, J); (P, V, F∈RF); (D, M); (F, L, T∈RT);

15.9 Weak redundancy method It is worth noting that weak redundancy [1979Cod] is a design method that temperate the consequences of the data redundancy: A weak redundant schema suffers transitive dependence, i.e. insertion delays and saga of changes but does not suffer the information lost associated with deletion anomaly. Strong redundancy is a label Strong redundancy [1979Cod] is just a label for the data redundancy coming from partial dependence or from transitive dependence (or from both).

15.9.1 Weak redundancy role Last but not least, weak redundancy method —a term coined by Codd [1979Cod]— can play a pedagogical role in the shops with divergent positions on normalization, that is, everybody agree that information lost is a price too high even increasing the transactions performance.

15.10 Intransitively dependent (T,A);

15.6.2 Transitive dependencies The thin green/red lines represent transitive dependencies.

15.6.3 Independent lines There are independent lines inside the essential composite nodes, e.g. S↮N. As part of a Boolean algebraic job, there are independent lines filling in the remainder digraph, i.e. the non-primitive dependencies.

15.6.4 Trivial lines Finally, trivial lines as (A·B)↣B can be part of FD structure when a primitive FD pulls out the trivial FD, for example, in the chain "(D)↢(N)↢(N·S·P)": 'D↢N' is the primitive FD pulling out (N) of '(N)↢(N·S·P)'. (N) here plays the fundamental role of chainning 'D↢N' to the FD structure.

15.7 Transitively dependent A table R already in 1NF is "transitively dependent" [1971Cod] if a "nonkey attribute" [1983Ken] functionally depends on another nonkey attribute. Given x≠y, y≠z, z≠x attributes of R; let x be the PK: ∃〈y, z〉∊{R MINUS x}: y→z For example, in the generic R structure of the above dependency diagram, (F) is transitively dependent on (N·S·P), and it is represented by a green thin line. E. Villar THIRD NORMAL FORM FUNDAMENTALS

A table R already in 1NF is intransitively dependent, if none of nonkey attributes functionally depends on another nonkey attribute. Formally, given x≠y, y≠z, z≠x; let x be the PK: ∄〈y, z〉∊{R MINUS x}: y→z.

15.11 Determinator axiom «One and only one element is independent (the governor)» [1970Rob]. The determinator is a node of R which does not depend on any other: One and only one determinator always exists in every dependency structure. The existence and uniqueness of the determinator is axiomatic and it inherits all the properties of the primary key at first level structure. Including the designation for being so at the designer's free will among interdependent nodes (if any), e.g. {(z)↔(x)↣(y)} NOR {(x)↔(z)↣(y)}.

15.11.1 Determinator singularity (?) (STUB)

15.12 Adjacent dependent node «The principles of dependency theory entail that direct structural dependencies always link adjacent categories, they cannot skip over them» [1994Bro].

15.12.1 Intransitive structure>Transitive reduction transitive>transitive reduction>intransitive structure

15.13 Immediately dependent and adjacent Let y be a descriptor node. As first approach, "immediately dependent" on the primary key x is at most one subset y of R, being a miß, and intransitively dependent on x. 2016-01-2412:51:18 Page 39 of 76

15.14.1 Optimal 3NF (1971) In [1971Cod] paper, our current Third normal form, e. g. R (P, F, D, L, T, E, J), was called Optimal third normal form vs. A set of Binary 3NF's with the following layout: {R¹ (P, F); R² (P, D); R³ (P, , L); R⁴ (P, , T); R⁵ (P, E); R⁶ (P, , J)}.

15.14.2 Optimal 3NF (1974) Definite Optimal third normal form [1974Cod] (abbr. o3NF) applies to any irreducible [2000Dat] Third normal form (abbr. i3NF) being part of an Optimal 3NF database schema (please, see: SCHEMA OPTIMIZATION chapter).

15.14.3 3NF express The designer may encounter 3NF tables just reading the 1NF table attributes. There are two cases in which R is in third normal form "without more investigations" [1971Cod]: T1. There is no nonkey attributes, e.g. R (P, L, J); T2. There is one nonkey attribute, e.g. R (P, L, J).

15.14.4 ThirdNF procedure The ThirdNF proc checks if there is some non-key attribute being dependent on some determinant being part of a candidate key invoking SecondNF proc. Then, when the nonkey domain is free of partially dependent attributes, the internally dependent nonkey attributes are the redundant attributes. FirstNF proc extracts the nonkey dependent attributes returning a miβ (which is the minimal nonkey domain) and the list of pairs of internally dependent attributes (if any). Then, theAliceNFproc consolidates the FirstNF miβ as new nonkey domain and creates the corresponding referenced 3NF tables with the mentioned list of attributes offending the nonkey minimality. Each determinant will be a foreign key as part of nonkey domain and —at the same time— the primary key of a new 3NF table.

16.1 Design defects independency Normalization repairs first the Partial dependence (abbr. PD) and then, the Transitive dependence (abbr. TD). The main asset of 1NF is having a minimal primary key, inherited by 2NF. PD and TD are design defects yielding to the following clades [2015WCL] for relational tables after combining the mutually independent characteristics PD and TD: 4. R with PD and TD defects, i.e. 1NF; 5. R with TD defect only, i.e. 2NF; 6. R with PD defect only, i.e. new ANF; and 7. R with no defects, i.e. 3NF. Minimal Minimal nonkey

There are seven conceptual levels of Normal Forms 1. Unnormal form 2. 0NF Inflated PK 3. 1NF tabulation 4. First normal form 5. Second normal form 6. Alice normal form 7. Third normal form

PK

n/a n/a n/a F F T T

n/a n/a n/a F T F T

T T T T T T T

T F T T T T T

UNF Objects

"R is in 2NF" means that R ―before the 3NF check― is already free of partial dependence. In other words, the nonkey domain is free of partial dependences and only need dealing with transitive dependences between pairs of nonkey attributes.

Designated PK

"A relation R is in third normal form if R is in 2NF and each attribute is immediately dependent on any key" [1971Cod].

Logical record design; First-class business model; Overlapping mißes; Third normal form categories; Heath: 3NF categories; 3NF category 4; 3NF category 5; and Resume of 3NF categories.

Nonkey PD free

15.14 Third normal form

• • • • • • • •

Nonkey TD free

However, after introducing the structural concepts of determinator, node and descriptor, being all them connected by intransitive lines, 'immediately dependent' concept includes being part of an intransitive line but also be "adjacent" [1994Bro] to its determinant.

T F F F F F F

16.2 Reminder on primary key Internal dependence defect only applies to a designated primary key of a table in zero normal form, which is repaired by an active FNF method. 1NF tabulation PD and TD defects are unknown. R in Unnormal form is repaired by the Normal procedure of 1970 [1970Cod] pushing down the UNF primary key —that as stated below is so minimal as the RM 1NF primary key. Both minimal primary keys are axiomatic assumptions from the beginning of computing File systems. Normal Procedure is a section of UNF DATA REDUNDANCY report.

15.14.5 FK optimizes transitive checks

16.3 First normal form 2.0

Looking for transitive dependences in R, let x be the PK, y and z two elements of nonkey domain (initially, two attributes), and 〈y, z〉 an element of Ω². Normally, every pair 〈y, z〉 must be checked by an FD with the possibility of discarding the dependent z from R. But in the case of being z a FK, z could not be discarded, and consequently, a FK cannot be dependent, i.e. z part in y→z. Every foreign key of the nonkey domain can be the left-hand side of y→z but cannot be the right-hand side. Every foreign key of the nonkey domain minimizes the number of transitive checks. And probably also has optimized the partial dependency checks.

First normal form 2.0 is an active FNF procedure that supplies irredundant multiattributes to the future 2NF, ANF and 3NF relations checking/repairing the designated candidate keys with Internal dependence defect (already defined in DATABASE DEPENDENCY report).

16. THIRD NORMAL FORM 2.0 THIRD NORMAL FORM 2.0 synthesizes and formalizes in plain English a direct 3NF design tool with the following sections: • Design defects independency; • First normal form 2.0; • Second normal form 2.0; • Alice normal form 2.0; • Third normal form 2.0; E. Villar THIRD NORMAL FORM FUNDAMENTALS

16.4 Second normal form 2.0 A relation R is in Second normal form (abbr. 2NF) if it is in 1NF and the nonkey domain is free of partial dependences; formerly, "each attribute of R fully depends on each candidate key but not on any subset of it" [1971Cod].

16.5 Alice normal form 2.0 A relation R is in Alice normal form (abbr. ANF) if it is in 1NF and its nonkey domain is irredundant. This new normal form is the only consequence of admitting the independence of PD and TD design defects. A relation in ANF is free of transitive dependence, i.e. "each nonkey attribute is intransitively dependent on the primary key". ANF is useful for having a modular NORMAL proc.

2016-01-2412:51:18 Page 40 of 76

16.6 Third normal form 2.0

16.10.1 The four categories of 3NF tables

A relation R is in Third normal form if it is in 1NF, and the nonkey domain is minimal, i.e. free of partial dependences and free of internal dependences; formerly, "if R is in 2NF and each attribute is immediately dependent on any key" [1971Cod].

Combining the mandatory primary key with the optional components, four canonical structures of database predicate emerge. Component Denotative Nonkey Name........... PK domain domain 3NF Express 1 T F F 3NF Category 2 T F T 3NF Category 3 T T F 3NF Category 4 T T T After the three 3NF categories discovered and defined by Heath, the Category 4 definition with the same parameters that the previous three.

16.7 Logical record design "In the past, design of records (computerized or not) for commercial, industrial and government institutions has been oriented in an adhoc way to the needs of particular applications" [1971Cod]. "The large and integrated databases of the future sorely needs application independent guidelines for the logical record design" [1971Cod]. Third normal form engineering intends "to provide such guidelines"[1971Cod]. "Physical records in third normal form will prove being highly economical in space consumed" [1971Cod].

16.8 First-class business model "Although the three normal forms are query equivalent, there is a difference in information content of the three forms. A business model in third normal form is likely to be more readily understood by people who are not everyday users of the data" [1971Cod]. In this sense, "the second [normal form] is more informative than the first, and the third is more informative than the second. The increased information lies in the data description, the third normal form tends to capture some aspects of the [business] semantic" [1971Cod] that is not so clear in an Undisciplined predicate (as already stated below). Besides, third normal forms are "also likely to be better tuned to the authorization requirements of installation" [1971Cod].

16.9 Overlapping mißes There is a known case [1982Zan] of two overlapping mißes connected by codetermines socket: Let R (x, y, z) be attributes of R and (y·x)↔(x·z) holds. Maintaining both mißes induces a data design loop categorized by Zaniolo as a lack in the "representation principle" [1982Zan]. However, from a Logical point of view, both disjoint mißes of a given relation structure are always visible as any single attribute is, two attributes never overlap each other and the designer must intervene as usual selecting the primary key in the form of determinator of the dependency structure .

16.9.1 A normal PK selection case Let R (x, y, z) be attributes of R; let (y·x)↔(x·z) holds. The rule is killing one of conflictive mißes or, in other words, one of the determinants "is arbitrarily designed as the" [1971Cod]determinator of R dependency structure, i.e. the future PK of R, and the other mißes will be a vanishing determinant. The formal designation of the determinator has the usual NOR thesis [1965H&L] format: IF (y·x)↔(x·z) THEN {(x·y)↣z} NOR {(x·z)↣y}

16.10 Third normal form categories From the point of view of Data design there are three key players for a classification of 3NF tables: • The primary key (the "PK") is minimal and mandatory; • The nonkey domain of elementary attributes is irredundant and minimal; it can be empty; • The denotative domain is the set of the alternate key and it can be empty; each alternate key is minimal. Taking into account that only the primary key is mandatory, we find four categories of third normal forms.

16.11 Heath: 3NF categories «It is clearly necessary to define normal forms in terms of dependency structure» Armstrong [1974Arm]. Before this Armstrong statement, Iam J. Heath [1971Hea] performed the following Linnaean taxonomy of the different Third normal forms (abbreviated TNF by Heath): 3NF Category 1; 3NF Category 2; 3NF Category 3; and TNF Category 4. It is worth noting that x, y, and z variables of Heath's FD specs are legal attribute subsets of R, i.e. we assume that it is part of both Functional dependency V1.1 & V1.2.

16.11.1 3NF category 1 "There are no functional dependencies involving all attributes, i.e. {x↛y; y↛x}" [1971Hea]. Definition 4: 3NF category 1 (Heath) Example: "PW (P#, W#) meaning the part with number P# is stocked in the warehouse with number W#. A given warehouse may stock several parts, and a given part may be stocked in several warehouses: {P#↛W#; W#↛P#} [1971Hea].

16.11.2 3NF category 2 "There is one functional dependency (and one no-functional dependency) involving all attributes, i.e. {x→y; y↛x}" [1971Hea]. Definition 5: 3NF category 2 (Heath) Example: "P (P#, PN, PC, PP) meaning the part with number P#, its description PN, its color PC, and its price PP; P# is unique but some given combination of (PN, PC, PP) can describe different parts: {P#→(PN, PC, PP); (PN, PC, PP)↛P#}" [1971Hea]. The exact example of Heath is "P (P#, PN)" [1971Hea]) whose converse would be of less value as example of 3NF Category 5.

16.11.3 3NF category 3 There are only two functional dependencies involving all attributes, one is the converse of the other, i.e. {x→y; y→x} [1971Hea]. Definition 6: 3NF Category 3 (Heath) Example: "DM (Dept#, Mgr#) meaning the department with number Dept# has the manager with number Mgr#. A given department has only one manager who only manages one department[1971Hea]: {Dept#→Mgr#; Mgr#→Dept#} [1971Hea]. Please, note that, in general, TNF category 3 is just a PK with a nonempty set of AK's.

16.11.4 TNF category 4 There are several overlapping dependencies, i.e. {(x, y)→z; (x, z)→y} [1971Hea]. Definition 7: TNF category 4 (Heath) Example: "R (Subj, Child, Posn) meaning Posn is the child´s position in that subject. A given child has a unique position for each subject, and so (Child, Subj) determines Posn. Moreover, a given position in a subject determines the child, and so (Sub, Posn) determines Child" [1971Hea]:

E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 41 of 76

{(Subj, Child)→Posn; (Subj, Posn)→Child}" [1971Hea]. "While overlapping dependencies are theoretically possible they will seldom occur in practice, and even when they occur in reality it seems unlikely both dependencies would be impossed by the data management system"[1971Hea]. The first three Heath categories are three pearls. But the current forth category, the own Iam J. Heath has criticisms and doubts. but effectively the 3NF Category 4 existed.

16.11.5 Criticism on Heath's TNF category 4 It is true that Category 4 has overlapping subsets but this is a possibility for the categories 2 and 3, too. Overlapping is not a discriminating Linnaean characteristic on the TNF family discovered by the own Heath. Besides and going to the main formal flaw of R: {(x, y)→z; (x, z)→y} (1); —which is the TNF Category 4 [1971Hea] formal definition— is that formula (1) has an equivalent formula: R: {(x, y)↔(x, z)} (2). And formula (2) matches the 3NF category 3 formula. Therefore formula (1) is a 3NF category 3 formula and cannot define at the same time another category.

16.12 3NF category 4 The full tunned Category 4 has primary key, nonkey domain and minor-key domain. There are three functional dependencies involving three mißes together a no functional dependency, i.e. {y→x; x→y; x→z; z↛x} Taking advantage of FD 2.1 devices, 3NF Category 4 is: {(z)↔(x)↣(y)}.

16.12.1 Example of 3NF category 4 Example: DM(Dept#, DivisionName, Mgr#∊EMP) meaning the department with number Dept# is part of a division and has a manager whose Mgr# in referencing DM table is called Personnal# in EMP table (please, see [1971Hea]). A given department has only one manager who only manages one department: {Dept#→Mgr#; Mgr#→Dept#} [1971Hea], obviously, each department is part of a given Division, i.e. {Dept#→Division#; Division#↛Dept#}. Please, take into account that y subset is just the first alternate key of this category, i.e. it also is the representation of a number of other possible alternate keys in the current formula. TNF categories 3 and 4 always are with a non-empty set of candidate keys.

16.13 3NF category 5 The elaborated 3NF Category 5 is the protagonist of the Star schema of Data Warehouses. Category 5 is the converse of the Category 2. The DW-oriented 3NF Category 5 has a multi-dimensional attribute primary key coming from a nonkey domain of a 3NF category 2 and its nonkey domain is a quantitative attribute. There is one functional dependency (and one no-functional dependency) involving all attributes, i.e. {y→x; x↛y} Taking advantage of FD 2.1, TNF Category 5 is:

(y)↣(x).

16.13.1 Example of 3NF category 5 Example: DIM ((PN, PC, PP), Q) meaning the three dimensions {piece description PN, color PC, and price PP} is unique; Q can describe to different (PN, PC, PP) combinations, i.e. {(PN, PC, PP)→Q; Q↛(PN, PC, PP)}.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

16.14 Resume of 3NF categories This section shows a resume of all 3NF categories together. FD 2.1 Category description The 3NF category 1 has a primary key only (x·y) which is a miß (or an attribute). The useful Category 2 has the mandatory (x)↣(y) primary key plus a nonkey domain; (z)↔(x)

The 3NF Category 3 has a primary key and a non-empty set of alternate keys.

(z)↔(x)↣(y)

The full tunned Category 4 has primary key, nonkey domain and minor-key domain. The DW-oriented 3NF Category 5 has a multidimensional attribute primary key coming from a nonkey domain of a 3NF category 2 and its nonkey domain (q) is just a quantitative attribute.

(q)↢(y)

17. DATA DESIGN REVIEW Data review is part of software development inspection from the beginning as stated as follows by Fagan [1976Fag]: 1. Substantial net improvements in programming quality and productivity have been obtained through the use of formal inspections of design and of code. 2. Improvements are made possible by a systematic and efficient design and code verification process, with welldefined roles for inspection participants[1993G&G]. 3. The manner in which inspection data is categorized and made suitable for process analysis is an important factor in attaining the improvements. 4. It is shown that by using inspection results, a mechanism for initial error reduction followed by ever-improving error rates can be achieved. ―M. E. Fagan, "Design and code inspectionsto reduce errors in program development" The order of exposition in DATA DESIGN REVIEW report is: • The art of database normalizing; • Data review using FD; • Normalization of a legacy schema; • Normalization of a Table; • Normalization summary; • NORMAL proc; • Overview of normalization; • Review Prolog; • Active first normal form; • Second normal form; • Third normal form; • Review Epilog.

17.1 The art of database normalizing The art of database normalizing has three parts: 1. Dependency theory; 2. Normalizing a table called 'first normal form' (abbr. 1NF); 3. Optimizing a database schema. The big deal of normalization consists in warrantying that designed tables cannot contribute to database insertion, update and delete anomalies. Such anomalies come from some known patterns of data redundancy in the operational rows of R. Happily, such redundancy has associated symptoms at table design phase, known as 'transitive dependence' and 'partial dependence'. Given a relation R(A, B, C, D∈T, …) together R*, which is its fair corpus of rows, the function of Dependency Theory is to provide a general method for selecting a dependency structure for the attribute set of R. 2016-01-2412:51:18 Page 42 of 76

Each dependency structure occurs in R², i.e. in a set of oriented binary relations like x→y, and forming a Tree-with-one-source, i.e. a digraph of points and lines. Each point of the tree is an attribute (x) or a maximally independent subset of attributes as (x·y·z …). Composing a maximally independent subset of R (abbr. miß) has a seed and three steps: The seed: let x↮y ≝ {〈x,y〉: x↛y & y↛x}; /step 1/ (z·w) ≝ {z↮w}; /step 2/ (y·z·w) ≝ y↮(z·w); and /step 3/ (x·y·z·w) ≝ x↮(y·z·w); /and so on/. Two types of holistic lines connect all the points of the tree: /a/ x↣y ≝ {x→y & y↛x}; and /b/ x↔y ≝ {x→y & y→x}. After synthesizing a dependency tree, the next task is a tessellation of the dependency tree into a non-empty set of sub-trees; in such a way that each sub-tree involves the maximum number of attributes, and some of them will have a new reference to its domain, as W∊T. As expected, the points —representing an attribute or a miß each point— are connected by intransitive lines. And each sub-tree corresponds to a subset of immediate attributes together its references and its own primary key. Now, using the attributes of a sub tree in a project operator over the original corpus of rows, we will get a set of rows without data redundancy. E. F. Codd discovered this type of relational tables without redundancy and cohesive attributes at 1971. He called each of them a Third normal form (abbr. 3NF). Not only that, IF you recover all the rows of the original tabulation (no more no less) with the corresponding Natural-join THEN you have a mathematical proof that your tables are in Third normal form.

17.2 Data review using FD One function of this theory is providing a general method for selecting a grammar for each language, given a corpus of this language. ―Noam Chomsky, Syntactic Structures [1957Cho] Let paraphrasing Chomsky. One function of Database dependency theory is to provide a general method for discovering the dependency structure of each relation, given a fair corpus of row of this relation. The relevance of Database dependency theory is the discovery of Third normal form as the irredundant and cohesive relation.

17.2.1 Semantic interpretation It is also part of Dependency theory, the syntactical and semantic interpretation of the selected dependency structure in a nonempty set of relations in third normal form according the "principle of representation" [1982Zan], i.e. without semantic loose.

17.2.2 Dependency substructures Having a dependency structure of R enables its tessellation on dependency substructures. Each substructure is empty of transitive dependencies, allowing an easy semantic 3NF interpretation. This is because transitivity cannot be attenuate inside the same relation. A redundant relation for being irredundant needs "emancipate" all its subordinate relations after recognizing as foreign key the future primary key of each emancipated new table. The process of separation is recursive, from the bottom to the top seems easier for human beings. However mathematically, it is a simultaneous deconstruction of all the existent intransitive structures.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

17.2.3 Substructures representation Normalization ends performing a semantical interpretation of each dependency substructure with the corresponding table in third normal form.

17.2.4 Cohesive database predicate A relation is cohesive because represents only one database predicate and it is irredundant because each row is the instance of a cohesive predicate. Normalization is the main and best-known application of Functional dependency theory.

17.3 Normalization of a legacy schema Normalization of a legacy schema starts performing a full dependency analysis on the existing rows of every table. Legacy normalization counts with an invaluable empirical source of rows for building a trustful primitive FD structure. After having the fair corpus of rows of R table the corresponding FD specs, tessellating and interpreting the holistic dependency structure of a table is an easy task. Please, find a detailed example in FUNCTIONAL DEPENDENCY report (below).

17.4 Normalization of a Table In the formal Data review environment, it is assumed that the necessary dependency structure should be extracted from the common deliverables of a table, this is why speaking of a design "defect", we use the plain English "attribute y is a property of attribute x" of Hughes and Londay [1965H&L] and of Kent [1983Ken]. Instead of the arcane of the FD "y depends on z" that, in its turn, it is formalized as (y→x), for the sake of our writing reduction. This data review only knows a type of performance: This table review includes a fine grain proposal repairing all the formal defects. There are 1+2 formal defects and only one solution for any of the three types of defects, i.e. isolating the set of attributes immediately dependent with its own primary key in a 3NF table.

17.5 Normalization summary F. Codd founds two today known design defects: Partial dependence; and Transitive dependence. However, the "first defect" should be the Internal dependence. Internal dependence occurs when a designated key is redundant.

17.6 NORMAL proc NORMAL procedure is the standard way of normalizing a 1NF entity providing the list of attributes and also a "fair corpus" [1957Cho] of rows of the future table. Such corpus of rows is not only a customary and opaque contribution but it is the better way for demonstrating the understanding of the functionality of the current entity [1957Cho] and, overall, for probing mathematically that your entity is in 3NF. If the object of revision includes a tabulation, then the inspector needs some initial strategy for getting the maximal dependency information with the minimal number of FD graphs. The expected FD structure hang from the primary key, even if you find some determinant being not the PK, does not change your beating horse in the middle of the river. The dependency matrix is easier of manage than the classic FD specs for tables with a number of attributes because it is continually spotting the holes and getting ride that none attribute is forgotten.

2016-01-2412:51:18 Page 43 of 76

For a small number of attributes, the classic Codd's FD specs are saving the matrix preparation. {3NF R} = NORMAL ( AliceNF ( SecondNF ( InternalNF ( FDSpecs ( FDS ( ACTIVE {1NF R} )))))); Procedure 2: NORMAL NORMAL starts in the inner loop, building up a primitive FD structure invoking (FDS {1NF R}) function. Step 1. ACTIVE proc performs the project of every column getting its active domain; save them in a Word horizontal page for arranging the columns in the order of the cardinality of active domains from left to right; at left, the candidate keys (if more than one) will pop for having the same cardinality of the rows of the table; one of them is the primary key; and all the attributes having less cardinality that the primary key (if any) are dependent on it; in the other hand, the remainder attributes (if any) are alternate keys, therefore being each codependent on the primary key. Step 2. FDS proc performs a selective use of OBSERVE method of Maier [1983Mai], until filling in a primitive FD matrix involved in the minimum FD noise: Arranging a full attribute matrix is the first step. Step 3. FDSpecs function selects the primitive FD specs, i.e. the FD set able of forming the structure of a digraph of the type "Tree-with-one-source" (each node is not repeated and there is no disconnected nodes). Step 4. InternalNF method checks/repairs composite CK internal dependence. Step 5. SecondNF methodchecks/repairs partial dependence. Step 6. AliciaNF method checks/repairs transitive dependence. The result is a set of primitive FD's directly interpretable (either R already was in 3NF) or the primitive FD set contains a set of perfectly chained substructures. In the second case, from right to left, the designer factor out in proto-3NF substructures such a list of FD's. Finally, the own designer interprets each FD substructure as a 3NF table.

17.7 Overview of normalization Review & Tuning a table R has the following functional structure:A Prolog, three algorithms (FirstNF, SecondNF and ThirdNF) plus an Epilog.

17.7.1 Normalization prolog Prolog disassembles one of the overlapping candidate keys (if any). You can add other complements, for instance, an Orphan table normalization by the way.

17.7.2 FirstNF method FirstNF method(abbr. INF)checks known CK's and assemblies the remainder CK's (if any) as irredundant and minimal multiattributes.

17.7.3 SecondNF method SecondNF method (abbr. SNF)discards from the nonkey domain the partially dependent attributes (if there is some composite candidate key); each pair of (part-CK→nonkey) passes to the Epilog task.

17.7.4 ThirdNF method ThirdNF method (abbr. TNF) takes care that the nonkey domain be irredundant (free of internal dependence) getting so an intransitive relation; each pair (nonkeyX→nonkeyY) attributes (if any) will be at care of the Epilog method. E. Villar THIRD NORMAL FORM FUNDAMENTALS

17.7.5 Normalization epilog Epilog is the last step of the normalization procedure. It assemblies the pairs of (nonkeyX→nonkeyY) in new tables. The Epilog of each Normalization performs the necessary DBA tasks: − Checking if each new entity is already in the SQL directory (a weak redundancy[1979Cod] case) for changing it, i.e. creating new columns; − Otherwise (if the new entity is not in the SQL directory), the normalizer will create the table as usual.

17.8 Prolog of data review In the case of two overlapping irredundant multiattributes, the attribute-attribute behavior does not apply because two attributes never overlap each other. The solution is an intervention of designer simply discarding the less meaningful conflictive multiattribute in the opinion of designer. The prolog also allows other small filters as rejecting the design of an entity for being a derivable tabulation, i.e. an SQLVIEW, etc.

17.9 Active first normal form DEPENDENCE A relation R with a designed PK suffers Internal Dependence if a key attribute y is a property of another key attribute x, i.e. x→y.

NORMAL FORM R is RM compliant (or R is in 1NF) simply leaving out the attribute y from the key. Without the offending attribute, the shorter key will be irredundant.

17.9.1 Internal dependence Internal dependence occurs when inside a designated multiattribute key an attribute depends on a disjoint part of such multiattribute key. The subset has internal redundancy but the relation can be without partial dependence and without transitive dependence. Such multiattribute of R is clearly a redundant multiattribute of R. "K⊋x→y⊊K" when x⊍y and being K a multiattribute is the formula of any internal dependence.

17.9.2 Internal redundancy Internal redundancy of a key has not been taken into account until now but it can exist, for instance, the FD y→z of any transitive dependence, i.e. {x→y; y→z; x→z}, is a pattern of internal dependence not inside a key but inside the nonkey domain. Zero normal form is is a particular case of internal redundancy in a designated primary key although in the border of RM world. Superkey concept of Maier [1983Mai] is a set of prime attributes of R including a candidate key, therefore it is an irredundant multiattribute M of R but superkey M is not a minimal “candidate key”. Now, a superkey M’ of R, including a candidate key of R and a nonkey attribute, is a conspicuous case of internal dependence.

17.9.3 Internal refactoring Let R (x, y, z) having only an internal dependence defect in the designed multiattribute PK of R. The solution is just a decrement of the redundant key in favor of a new relation with the partial determinant as primary key and a nonkey subset with the dependent attribute. R (x, y, z); Let R∍y, z∊{R MINUS y} and z↮x; x↮y; y↣z; R: {x↮y; y↣z}; -- minimal internal dependence R: (x·y)↣y↣z; -- FD structure R: {RY: (y↣z); RXY: (x·y∊RY)}; -- refactoring

2016-01-2412:51:18 Page 44 of 76

17.9.4 ID semantical interpretation R: {RY: (y↣z); RXY: (x·y∊RY)}; R: {RY (y, z); RXY (x·y∊RY)}; -- 3NF interpretation The interpretation specifies the referenced relation before the referencing relation.

17.10 Second normal form DEPENDENCE NORMAL FORM A relation R in 1NF R is 2NF compliant if labors under Partial every nonkey attribute Dependence defect if a is independent of every nonkey attribute z is a subset of every property of a key composite CK. attribute y, i.e. y→z. Partial Dependence is a defective design that occurs when a nonkey attribute is a property of a subset of a multiattribute primary key. Its task is cleaning the nonkey domain of the attributes being dependent on a part of any candidate key of R.

17.10.1 Partial refactoring Assuming only a partial dependence defect, the solution is a 2NF refactoring giving two 3NF's. Let R ((x·y), z, w). But R: {(x·y)↣(z·w); y↣z}; -- PD pattern R: {(x·y)↣w; y↣z}; -- primitive FD specs R: {(x·y)↣w; y↣z; (x·y)⊋y}; -- IF (x·y) ⊋ y R: {(x·y)↣w; y↣z; (x·y)↣y}; -- THEN (x·y)↣y R: {(x·y)↣w; (x·y)↣y; y↣z}; -- change FD order R: {w↢(x·y); (x·y)↣y; y↣z}; -- prepare FD structure R: {(w)↢(x·y)↣(y)↣(z)}; -- FD structure R: {RXY: {w↢(x·y∊RY)}; RY: {y↣z}}; -- refactoring R: {RXY: {(x·y∊RY)↣w}; RY: {y↣z}}; -- beautify

17.10.2 PD semantical interpretation R: {RXY: {(x·y∊RY)↣w}; RY: {y↣z}}; R: {RY(y, z); RXY ((x·y∊RY), w)}; -- 3NF micro-schema

17.11 Third normal form DEPENDENCE NORMAL FORM An R in 2NF has R is 3NF compliant when multiattribute Transitive dependence the when a nonkey nonkey subset of R is a attributez has a minimal irredundant property of another multiattribute. nonkey attributey, i.e. y→z. Transitive dependence is a defective design that occurs when R is already 2NF compliant, i.e. all nonkey attributes are fully dependent of any candidate key, however, a nonkey attribute z is a property of another nonkey attribute y, i.e. y→z. The first necessary condition for this defect is the existence of a multiattribute nonkey domain and the second that the nonkey subset suffer internal dependence. Formaly, let z≠y nonkey attributes of R; z is the property of y, i.e. "z←y" ("z depends on y") although usually, we write "y→z".

17.11.1 Transitive refactoring Assuming only a transitive dependence defect, the solution is a 3NF refactoring giving two 3NF's. Let y∊{R MINUS K}, z∊{R MINUS K} and y↣z; By key definition: K↣y & K↣z; and R: {K↣y; K↣z; y↣z}; R: {K↣y; y↣z; K↣z}; -- properly chained R: {K↣y; y↣z; K↣z} ≡ {K↣y; y↣z}; -- trans. reduction R: {(K)↣(y)↣(z)}; -- minimal transitive structure

17.12 Epilog of data review If there is some orphan table, it will be repaired with the Ontological approach. The existence of every new relation will be checked in the SQL directory by the primary key name. IF the relation exist, THEN a second check on the existence of each attribute name will decide if the attribute existed or it is a new one. OTHERWISE, the new intransitive relation will be incorporate to the development directory, together all its columns.

18. HOLISTIC DATA DESIGN "Unlike a prescription, which describes the steps or recipes for solving some problem, a pattern describes the desired result" Meszaros & Doble (2014) [2014M&D]. Designing entities from scratch is the business of Logical design and the first application of a solid Third normal forms classification. This report is on a data design directly in 3NF: • 3NF on database structures; • Summary of Data Design; • 3NF on database structures; • Internet 3NF database challenge; • Main classes of 3NF predicates; • Associative patterns; • 'Bread & butter' patterns; • Characteristic table pattern; • Club & Heath pattern; • ⅅ: {ℬread & ℂlub UNION} patterns; • Palette of primary keys; and • Each entity type, its 3NF predicate.

18.1 3NF on database structures The canonical RM holds the tables in third normal form: (1) The attribute set of R is complete, i.e. all known attributes are present in the same table; (2) The primary key is a single attribute or it is an irredundant multiattribute of R; (3) The subset of nonkey attributes (if any), it is an irredundant and minimal multiattribute or a single attribute; (4) The primary key determines every nonkey attribute; (5) None attribute of the nonkey subset is determined by a proper subset of the primary key; because the primary key is not only irredundant but minimal; (6) Each alternate key (if any), it is an irredundant and minimal multiattribute or a single attribute which denotes the primary key and, at the same time, the primary key identifies it; and (7) Alternate keys are so in the shadow, i.e. without documenting them in SQL.

18.2 Main classes of 3NF predicates This chapter continues the Heath's trail of classifying 3NF. The first three categories of Heath is the seed of our 3NF classification. There are four classes of consistent database predicates. Each of them is one the four possible design types of a set of attributes together its primary key, as already discovered playing with the Boolean combination of the three known components of R: • The mandatory primary key; • The handy but optional nonkey domain; and • The optional minor-key domain.

17.11.2 TD semantical interpretation R: {(K)↣(y)↣(z)}; R: {RY(y, z); RK(K, y∊RY)}; -- 3NF micro-schema E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 45 of 76

18.4.1 Minimal class Ᾱ However, a minimal class A it is not an association but a pure Logical class, for example, the set of available bike colors in a factory, i.e. COLOR {Black, White, Red, Blue}.

18.4.2 (A) Associative table (A) Associative table is the zone of Many-to-many associations. The natural key of any associative table has two references. Associative table is without descriptive attributes.

18.4.3 (AR) Reflexive table (AR) Reflexive table is the zone for tables that are referencing its own primary key. This area includes the so called BOM (Bill-OfMaterial) structure assembling (top to bottom) every part with its components. Or vice versa, assembling (bottom to top) every component with all its participated parts. Quick information of this old technology can be found at Wikipedia: "Bill of materials".

18.4.4 Quine triadic association Legend 2: x is the primary key; y is the nonkey domain; z is the minor-key domain. After data warehouses eclosion, a new 3NF pattern emerged. It is the converse of class ℬ, i.e. the 3NF class ℬ⁻¹ that has its corresponding canonical predicate ℬ⁻¹ and its pattern is here for stay, supporting datawarehouse 3NF design.

18.2.1 Novo days classes of 3NF patterns At 2016, we have the following stablished classes of 3NF predicates. Name 3NF Class ₳ 3NF Class ℬ 3NF Class C 3NF Class D

Description (primary key) (primary key) + nonkey-domain minor-key domain + (primary key) minor-key + (primary key) + nonkey

3NF Class ℬ⁻¹

(ℬ nonkey-subset) + quantity

18.3 Internet 3NF database challenge Besides these five 3NF classes, the databases of Internet servers are claimming a new colorful oriented 3NF mixing prosaic datatypes, text, picture, audio, video and map blobs with extensive use of text indexing and a pleiade of human beings describing each of them in order to feedding each textual index with an irredundant description. A very interesting 3NF challenge because the pattern of this new 3NF relation would be a 3NF schema with some binary relation [1979Cod] inside, i.e. just the opposite of an Optimal third normal form of 1971 [1971Cod]. Not only that, such Logical 3NF must follow the track of Physical implementation of blobs in the current r-DBMS. For instance, the blob column J of R (P, F, D, L, T, E, J) is implemented as R: {R (P, F, D, L, T, E); R′(P, J)}.xxxxxxxxxxxxxx

18.4 Ᾱssociative patterns

Canonical predicate Ᾱ offers the following entity patterns: [1] Logical Class; [2] Ordered binary relations; [3] Unordered binary relations; [4] Ordered polyadic associations. Canonical predicate Ᾱ Description

A standalone primary key.

CLASS (P); ASSO (F∊X·L∊Y); ASSOR (SK, F∊X, L∊Y); The multiattribute primary key is a set of already existent single primary keys, i.e. predicate subjects of already existent entities. Every attribute of a 3NF class Ᾱ is the subject in its own home set, i.e. each attribute is a foreign key. Patterns

E. Villar THIRD NORMAL FORM FUNDAMENTALS

According Quine [1981Qui], a triadic association, e.g. XYZ (x∊X·y∊Y·z∊Z), internally is a binary one, between: • An existent ordered pair 〈y, z〉; and • A new (left ordered) component x. x, from now on, is the left-hand side and the first member of the new binary pair 〈x, 〈y, z〉〉; 〈y, z〉 is the right-hand side and the second member of the new 'binary relation' with three attributes. XYZ is a triadic ordered relation, between an entity X represented by its simple PK (at left hand side) and an existent, ordered binary associative 3NF entity YZ of class A. Please, find, the "full melody" of Quine's idea on polyadic associative predicates in QUINE NORMAL FORM report (below).

18.5

ℬread & butter patterns

The 3NF class ℬ is the typical relation: A primary key and a set of attributes holding the properties of the subject always represented by the primary key. Canonical predicate ℬ A primary key and its nonkey domain. KERNEL (P, (F·L·T·J)); CHARACTERISTIC (〈P∊K, D〉 as PD, (F·L·T·J)); The predicate B supports two RM/T entity types: Kernel and Characteristic. Both with single subjects The kernel type has a single primary key and the characteristic entity always has a telescopic and ordered primary key because being part of a characteristic hierarchy. Description Examples

18.5.1 (K) Kernel pattern (K) Kernel table is the zone of No-referencing tables. Its natural key is simple and it does not reference any table. Obviously, it can be referenced, and at least, it will be referenced one time. Otherwise, the kernel table will be also an orphan table.

18.5.2 Descriptive family The entities whose descriptive attributes reference the same kernel entity K form a descriptive family, i.e. the descriptive family of kernel entity K.

18.5.3 Characteristic table pattern (CH) Characteristic table is the zone of One-to-many associations. In ERA terms of Chen [1976Che], a characteristic table is a "weak entity". Each characteristic table has one reference inside the key and can be referenced. A set of characteristic tables forms a characteristic hierarchy. The "internals" of a characteristic Primary key composition before the ℂlub & Heath pattern section.

18.5.4 (G) Grouping table pattern (G) Grouping table is the zone of M:1 associations and it includes simple grouping entities. A kernel or a characteristic entity is the "M rows" part of the association that are summarized. 2016-01-2412:51:18 Page 46 of 76

18.6 Characteristic primary key Tables with this type of primary key are called "characteristic entity" by Codd [1979Cod] and "weak entity" by Chen [1976Che]. In a data warehouse, each hierarchical dimension also is of characteristic table type, e.g. a perpetual calendar. Composing a characteristic primary key has three aspects: • Tag name; • Definite denotation; and • Composite tag.

18.6.1 Tag name

Class ⅅ is a full descriptive predicate: It has a nonkey domain describing the subject (PK) and a non-empty set of candidate keys denoting its PK. However, ERA one-to-one relationships also exist and sometimes with its functional claims. Canonical predicate D will swings between interpreting its denotative part as an extended "description" of the subject or representing a "one-to-one ERA relationship". Canonical predicate ⅅ Description

A primary key with nonkey domain and minorkey domain.

A tag is a code used to be managed by people and machines. The source of tags can be the own micro-world, e.g. the serial number of R2P2, a Part# shared by suppliers and customers of robots, etc. The AccountNr of the banks is a good example of a tag for a commercial agreement whose denotation otherwise would be very long.

Class ⅅ can form small hierarchies of specialization, e.g. MANAGER IsA EMPLOYEE, or of inheritance, e.g. CHESS_PLAYER inherits the set of attributes of EMPLOYEE.

18.6.2 Definite denotation

18.8.1 Embedded 1:1 pattern

Pattern

SPECIAL (P, J, F·L·T·D)

In Logic [1910R&W], a definite denotation is a circumlocution able to identify the subject e.g. ‘The-robot-with-wheels-of "The war of stars". In database business, the definite denotations are composite tags. A lot of our composite primary keys are BNF recursive denotations [1959Bac].

An ERA one-to-one between entities R and S can be embedded as part of one of the two entities. The entity with biggest cardinality does not change, e.g. R, and the other, i.e. S, receives a foreign key, i.e. the PK of R, which is the denotative reference. Both primary keys together represent the ERA {R} 1:1 {S} relationship.

18.6.3 Composite tag

18.8.2 Denotative table

In database business, the definite denotations are ordered composite tags, having the implicit form of: "The nthy of x". For instance: ‘22203’ means "The 3rd son of the employee no. 222"; - ‘28230’ is "The postal district 230 of Madrid". The same information in functor format is: '22203'=TAG("3rd son of the employee 222"); ‘28230’=TAG("postal district number 230 of Madrid").

A denotative table is like a kernel table with some descriptive alternate keys, i.e. AK being not foreign keys. The meaning of a descriptive AK, be they single or multiattribute, is a mandatory business definite description of the primary key.

18.6.4 Recursive tag For having a general way of construing tags: -- x must be an existent tag; x is the . -- y is a short typographic sequence covering the case, e.g. 001999; y is called the . IF IsA THEN CONCAT IsA ; And so on.

18.7 ℂlub & Heath pattern The predicate ℂ is useful representing ERA standalone 1:1 relationships. Canonical predicate ℂ Description A primary key and its minor-key domain, Examples ALIAS (P, (F·L), (T·D), J). 3NF class ℂ is the union of Category 3 & Category 4 of Heath, for instance, "R: {(x, y)→z); (x, z)→y)}" which is equivalent to "R: {(x, y)↔(x, z)}" but before arriving to this equivalence, one of the two overlapping candidate keys will be selected as primary key following the killer

18.7.1 One-to-one denotative pattern Entities R and S do not change, a separate "associative" entity RS begin with two attributes: The identifier of R and the identifier of S. Each of identifiers is a foreign key representing an individual of its respective entity and, at same time, a supervened candidate key of RS.

18.8 ⅅ: {ℬread &ℂlub UNION} patterns The class ⅅ is the fusion of a class ℬ and a class ℂ. In other words, class ⅅ is a class ℬ with some denotative touches, i.e. a flat structure oriented to subtables with inclusion constraints. E. Villar THIRD NORMAL FORM FUNDAMENTALS

18.8.3 Molecular relation This kind of ⅅ pattern also covers the molecular relation [1990Cod] which is the maximal complication of a third normal form. Such complication comes associated with the lot of marks of "property inapplicable" [1990Cod] subtype that an RM molecule needs. However, an RM molecular schema [1990Cod] of irreducible 3NF relations with referential integrity can represent the same type and subtypes of the molecular relation but without marks and in less physical space than the standalone molecule. Please, see below, UNMARKED NORMAL FORM.

18.9 ℬ⁻¹: (Dimensional pattern) (B⁻¹) Dimensional table is a sophisticated entity of (G) zone. B⁻¹ is the data warehousing zone of M:1 associations and it includes grouping DW tables. A dimensional table is the converse of a descriptive kernel or of a characteristic entity. The descriptive attributes are now the dimensional primary key and there is only a quantity describing the multiattribute key. SQL has a strong DML support for grouping many rows [1991Gra] but SQL has not yet DDL supporting DW design facilities. The commercial DW development facilities keep the deliverables in a private dictionary known as "metadata". The metadata documents the grouping of M operational rows in 1 DW row (a M:1 reference but "orphan" of its own SQL support).

18.9.1 3NF Class B⁻¹ 3NF class B⁻¹ is the converse of the 3NF class B. Canonical predicate B⁻¹ A composite primary key described by a quantity; Description Each component of the primary key is an old descriptive attribute in the operational database; Example R ({F·L·T·D·J}, Q); Emphasizing the relevance of B⁻¹ class comes for being the support of irreducible 3NF design of data warehouse (abbr. DW), a database plenty of dimensional schemata. The design, the 2016-01-2412:51:18 Page 47 of 76

cardinalities and the new technologies of DW databases have challenged the art of database normalization.

18.9.2 DW multidimensional schema A DW multidimensional schema has a central factual table and a dimensional table for each primary key component. One of dimensions used to be the historical "time" using the day as UnitOf-Measure (UOM). Every primary key component is a foreign key. The home table of each determinator component is called a 'dimension'; each 'dimension' table used to be the home set of a descriptive family; the descriptor set of the class B table and the determinator components of class C are sharing a subset of dimension tables; for instance: {L(L); D(D); J(L); T(T); -- some shared dimension P↣(F·L∊L·D∊D·J∊J); -- class B (T∊T·L∊L·D∊D·J∊J)↣Q} -- class B⁻¹

18.9.3 Sixth Normal Form & 3NF Class B⁻¹ Class B⁻¹ matches Sixth Normal Form (6NF) description: "A regular relvar is in 6NF if and only if it consists of a single key, plus at most one additional attribute." [2005Dat]. This 6NF without temporal dimension is an instance of our "3NF class B⁻¹". Both categories of this class are irreducible 3NF.

18.10 Palette of primary keys Each ERA object has its own type of primary key. ERA Object RM/T Entity Type Primary Key Entity Kernel Simple PK Weak entity Characteristic 〈tag∊K, discriminator〉 M:N Binary PK of 2 simple Associative association FK's 1:1 association Denotative Simple PK The same occurs with the known variations of an RM/T entity. ERA Object

RM/T Entity Type

Reflexive M:N Association

Reflexive entity

Referenced M:N association M:1 Grouping M:1 Dimensional

Referenced associative entity Grouping entity Dimensional entity

Primary key Binary PK with 2 simple FK's (being both the PK of the same home set) A surrogate key (abbr. SK) A simple PK One FK per dimension

18.11 Each entity type, its 3NF predicate Any Logical RM/T entity matches with one of the four 3NF classes. There is no more than the four 3NF classes, i.e. {A, B, C, D}. And there is no more entity types than the seven above explained of this list, i.e. {A, AR, B⁻¹, CH, D, G, K}. 3NFclass.MODEL Ent EXAMPLE COLOR(ColorName) A.CLASS(x) K A CONTRACT(CustId∊C, AccNr∊A) A.MN(x∊C, y∊A) EMP(EmpId,Dept∊D,Name,Title,Salary) B.R(x¹∊B, x²∊B) K K TITLE(TitId, TitName) B.R(x, y) B.R(〈x¹∊C,x²∊D〉 H DISTRICT(CityId, DistrictId, District) C as CD, N) BOM(Piece∊P, PartOf∊P, Q) B.R(x¹∊B, x²∊B) AR B⁻¹ CUBE(T∊TIME, S∊SPACE, SalesQuant) B.R⁻¹ (y⊊R, x∊R) G BALANCE(Year∊Kal,Asset,A,Liability,L) B.R⁻¹(y⊊R, x∊R) D DEPT_MAN(DeptId∊D, Empld∊E) C.OO(x, z) AK KONTRACT(SK, CustId∊C, AccNr∊A) C.KMN(x, z) DEPT(DepId, DepName, ManId∊E) D.KNA(x, y,z) D Legend. K:Kernel; CH:Characteristic; A:Associative; D:Denotative; AR:Reflexive; G:Grouping; AK:Referenced association.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

19. LEGO® DATA DESIGN LEGO® DATA DESIGN is a very detailed Data Design guide in prescriptive format. The objective is designing 3NF entities without any other consideration. In this sense, this guide is complementary but also it is competing with the HOLISTIC DATA DESIGN. The set of rules of data design applies the ideas of Codd [1971Cod] to directly designing "one PM descriptive predicate per relation", forming a database schema of RM/T entities in 3NF.

19.1 LEGO® data design credits The seed of this chapter, a lot of words and the style comes from the paper of Amos Siebes & Martin L. Kersten of 1987 [1987A&S]. «Customary an entity is introduced as a representative for an individual or thing in reality. The properties of the entity are described by attributes while part of the attributes is essential for its identification. We take an opposite position. Namely, we define an entity-type as nothing more than a name for a set of attribute-names. Thus the characteristic information of an individual or thing is fully described by its attribute-names. The entity-type name itself does not carry additional semantic information» [1987S&K]. Amos Siebes & Martin L. Kersten (1987) The sequence of table design follows that of Chris Date [1986Dat] with some minor retouches. ERA is the acronym of Entity-Relationship-Attribute. It is a well stablished Data Model [1976Che], [1986Zac], [1987S&K],[1992BCN]. A current version of its known diagrams can be found in S. Bagui & R. Earp [2011B&E]. The axioms are chained from the known to the unknown and respecting the dependences of the deliverables. The sections of LEGO® 3NF DESIGN are: • ERA objects; • The 3NF Decalogue; • Entity definition; • Property definition; • Axiom 0: Entity name is neuter; • Axiom 1: Each attribute its semantic; • Axiom 2: Attribute is not null; • Axiom 3: Each entity its attribute set; • Axiom 3⁻¹: Each attribute its entity; • Axiom 4: An standalone identifier per entity; • Axiom 4⁻¹: Only one entity identifier; • Axiom 5: An irredundant identifier; • Axiom 6: FK with explicit specs; • Axiom 7: Maximize descriptive FK's; • Axiom 8: Only a single PK can be FK; • Axiom 9: Minimal nonkey domain.

19.2 ERA objects RM/T schema design is detailing the set of related data stores being part of an already existent Data Flow Diagram [1979Y&C] using its own type of diagrams. The main objects of the ERA model are: • Entity (represented by rectangles): and • Relationship (represented by lines among rectangles); • Properties are the attributes of each entity subject. In our case, the relationships are part of referencing entity in the already known format of references, i.e. a AttributeName∊ENTITY_NAME (being AttributeName a Foreign Key). Each subject is represented by an Identifier which is also known as Primary Key. We continue calling R to the current entity. RM rows are now entity cases.

2016-01-2412:51:18 Page 48 of 76

19.3 3NF Decalogue AN ENTITY ABSTRACTS A SET OF CASES AN ATTRIBUTE DESCRIBES A PERSON OR A THING AXIOM 0: ENTITY NAME IS NEUTER AXIOM 1: EACH ATTRIBUTE ITS SEMANTIC AXIOM 2: ATTRIBUTE IS NOT NULL AXIOM 3: EACH ENTITY ITS ATTRIBUTE SET AXIOM 3⁻¹: EACH ATTRIBUTE ITS ENTITY AXIOM 4: AN STANDALONE IDENTIFIER PER ENTITY AXIOM 4⁻¹: ONLY ONE ENTITY IDENTIFIER AXIOM 5: MINIMAL IDENTIFIER AXIOM 6: FK WITH EXPLICIT SPECS AXIOM 7: MAXIMIZE DESCRIPTIVE FK'S AXIOM 8: ONLY A SINGLE PK CAN BE FK AXIOM 9: MINIMAL NONKEY DOMAIN

19.4 Entity definition An entity is introduced as an abstraction of a set of individuals in the reality.

19.4.1 Entity predicate An entity R is the name for a set of attributes forming an unordered predicate whose subject is an underlined attribute called "the identifier" or "primary key" or "the key", e.g. R (P, F∈C, L∈B, T, E, J), and the other attributes (if any), the predicating part.

19.4.2 Entity property The properties of the entity are described by a set of attributes while part of the attributes (it can be only one) is essential for the identification of each entity case.

19.4.3 Pertinent attribute Thus the characteristic information of an individual [person, thing or binary relation] is fully described by the pertinent attributes. An attribute is not pertinent to a given entity when breaks the criterion of avoiding "occurrences of the special null which means value inapplicable" [1979Cod].

19.4.4 Entity notation Primary key, as usual, is underlined. Each "foreign key" goes in Logic notation, i.e. as "ElementOf" another entity, e.g. R.F∊C; "R.F∊C" means that the domain of R.F attribute is the primary key of C table.

19.5 Property definition (a) A property is a typical attribute of a person or a thing; (b) A property also can be the own person or thing having attributes, i.e. the predicate subject; (c) A property also can be one of the two subjects being part of a binary relation.

19.6 Axiom 0: Entity name is neuter AXIOM 0 The entity name itself does not carry information on attribute structure. Each entity name plays an important role in the communication among all data design protagonists. However, the AXIOM 0 stablishes and remarks that entity name does not influence its own attribute structure. For instance, at entity review, each set of attributes is instantiated according dependency syntactical criteria only (after disclosing the primitive functional dependencies of each entity).

E. Villar THIRD NORMAL FORM FUNDAMENTALS

Corollary 0.1 Each entity has a business name shared by final users, data designers and software developers.

19.7 Axiom 1: Each attribute its semantic AXIOM 1 Each attribute has a single nondecomposable semantical interpretation. The AXIOM 1 centers the focus on the attribute as semantic unit. Corollary 1.1 Each attribute has a business name shared by final users, data designers and software developers.

19.7.1 An attribute is irredundant POSTULATE 1 IF each attribute has a single non-decomposable semantical interpretation THEN an attribute is irredundant.

19.8 Axiom 2: Attribute is not null AXIOM 2 SQL specs of any attribute will be with "IS NOT NULL". Each attribute has an associated DOMAIN or an associated DATATYPE and it must have a value not a "missing value db-value” [1990Cod]. COROLLARY 2.1 None attribute "IS NOT NULL WITH DEFAULT".

19.9 Axiom 3: Each entity its attribute set AXIOM 3 No two entities can have the same set of attribute names. AXIOM 3 spirit is that the names of attributes should be entityoriented, for instance, "JoinDate" in EMPLOYEE table is a better name than "Date" as is. However, being compliant with AXIOM 3 is easy, it is enough having one attribute name different of the reused existent entity. COROLLARY 3.1 The attribute concept can be shared among entities. Recognized business concepts have its own functional space. COROLLARY 3.2 Each entity has a definite description: Its own attribute name set. COROLLARY 3.3 Each attribute name set cannot be duplicated in the same database schema under the umbrella of a new entity name. The COROLLARY 3.3 continues the AXIOM 0 complementing its meaning, and prevents any horizontal distribution of a base table. COROLLARY 3.4 Each attribute set is a black-box of inseparable attributes. The COROLLARY 3.4 prevents any vertical partition of a base table. 19.10 (Axiom 3)⁻¹: Each attribute its entity (AXIOM 3)⁻¹ Each attribute name has an entity name and a domain name. (Axiom 3)⁻¹ is a formal directive apart of some interesting administrative tasks related to the reusability of an entity. Corollary 3.1⁻¹ The entire set of attributes of an entity are reusable in another schema of the same database changing the name of the entity and some of the attribute names and specifically the name of the components of the primary key must change. Reusing entities starts by this formal protocol but it seems that there is also some good practices associated to this possibility.

2016-01-2412:51:18 Page 49 of 76

For instance, the pragma ADDRESS not only can be in different schemas of a company, for instance, CUSTOMER, SUPPLIER and EMPLOYEE schemata may have a customized ADDRESS entity each. Reusability of deliverables of data design has a formal protocol, the known dimension of optimizing resources, and sometimes a normal support of company standards or legal regulations. 19.11 Axiom 4: An standalone identifier per entity «One and only one element is independent (the governor)» [1970Rob]. AXIOM 4 No two entities in the same schema can have the same identifier name or the same active domain. The AXIOM 4 continues the idea of AXIOM 3 emphasizing the Logic importance of having unique binomial "Identifier (value)" across each database schema. The AXIOM 4 prohibits database vertical partitions. AXIOM 3 plus AXIOM 4 support schemata with unique: , , and the binomial " (value)" that connect all with the reality.

Denotative entity implements ERA 1:1 relationship The AXIOM 4 disallows reusing the same identifier name of the family father's for the reminder primary keys of its denotative tables. For being compliant with AXIOM 4, each subtype entity, e.g. SECRETARY, must have its own identifier; and the identifier of its supertype, e.g. EMPLOYEE. The identifier of the supertype will be a referential attribute in the subtype entity, but not the own subtype identifier. Though, the role of the foreign key is functionally necessary but the role of being candidate key is supervened for being the unique way of implementing an ERA 1:1 relationship. UNMARKED NORMAL FORM is the AXIOM 4 at works. In the previous case, there is a wanted side-effect: The schema optimizer —at looking for vertical partitions phase—will respect the denotative families without confusing its pair of denotative tables with a database vertical partition defect.

19.12 (Axiom 4)⁻¹: Only one entity identifier (AXIOM 4)⁻¹ Each entity has "no more than one" [1990Cod] identifier. After identifier recognition or discovery, the remainder candidate keys (if any) sink into the descriptive attribute subset, denoting the identifier. Each alternate key, being so, will fulfill its denoting role without its explicit SQL declaration. The (AXIOM 4)⁻¹ prohibits wasting designer and physical resources in a uniqueness functionality that our entity already has. 19.13 Axiom 5: Minimal identifier AXIOM 5 Identifier is an attribute or a minimal irredundant subset. AXIOM 5 is already explicit in the 1NF definition. 1NF entity always had a minimal irredundant composite identifier for being the source of every identifier in the normalization process. The AXIOM 5 bans any kind of redundancy inside a multiattribute identifier, i.e. AXIOM 5 stablishes the minimality of the PK that comes from its uniqueness; in its turn, minimality only can applies to an irredundant multiattribute of R, finally, an irredundant multiattribute is free of Internal dependence.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

This chain from uniqueness until minimality of the identifier is typical of 1NF relations but not only: UNF databases (and physical files) with periodic groups are with minimal business primary keys.

19.14 Axiom 6: FK with explicit specs AXIOM 6 Specifying each known foreign key is mandatory. AXIOM 6 discards that a foreign key were undocumented in the referencing entity. The AXIOM 6 discards any hidden foreign key attribute as part of a referencing entity. The AXIOM 6 helps preventing partial dependence and transitive dependence of the corresponding attributes in referencing entities.

19.15 Axiom 7: Maximize descriptive FK's AXIOM 7 Every descriptive attribute of the nonkey domain whose active domain is of more than four values, is a perfect candidate for creating its home-set. Corollary 7.1 Zero orphan entities Every entity of a plural 3NF schema falls into referencing entity type or referenced entity type. Or both. Corollary 7.2 Maximizing references increases the cohesion of the entities taking into account that the best balance cohesion/price is better in proportion to the entities cardinality. FK as component of a composite primary key is part of the semantic of the design and the designer cannot bypass its specification. However, creation of new kernel entity that will hold the set of possible values of a given descriptive attribute (being not Boolean) is part of the current decisions of the entity author. From the point of view of the 3NF compliance of an entity at the minimal energetic cost, promoting a pure descriptive attribute to foreign key is the choice.

19.16 Axiom 8: Only a single PK can be FK «We remark in passing that primary keys and foreign keys therefore always consist of a single attribute in RM/T never a combination of attributes» CJ Date [2000Dat]. AXIOM 8 Only a single identifier can be part of the set of attributes of a referencing entity. The AXIOM 8 prevents the specification of a composite reference. The AXIOM 8 also prevents that some attribute being not the PK of a "referenced" entity can be part of another "referencing" entity, for instance, a single alternate key. The AXIOM 8 points out that using single attributes, as the only references, will construct friendly 3NF entities but it is not a defense of being the only way of designing 3NF entities. The AXIOM 8 influences the design of referenced associative entities which will need a surrogate key for being AXIOM 8 compliant. COROLLARY 8.1 The name of the identifier of associative entities falls under AXIOM 2 immediately after surrogate creation.

2016-01-2412:51:18 Page 50 of 76

COROLLARY 8.2 Referencing a composite identifier of a characteristic entity according AXIOM 8 means defining the composite identifier as it were a monoattribute, e.g. AB(A8)εR, accompanied by a redefinition, e.g. AB(A8)εR AS (A(A4), B(N4)), in both sides: In the referenced entity and in the referencing entity. These redefinitions are trivia for experienced programmers. AXIOM 8 prevents that referenced entities cooperate in the partial dependence or the transitive dependence of the referencing entities. Minimal PK discovery section contains all the information related to AXIOM 8.

19.17 Axiom 9: Minimal nonkey domain AXIOM 9 The nonkey domain is optional but if it exists, it must be semantically complete, irredundant and minimal. AXIOM 9 asks reflecting the complete set of properties of data store specifications. The irredundancy of AXIOM 9 bans any kind of redundancy inside the multiattribute nonkey subset, i.e. nonkey irredundancy disables the possibility of transitive dependence in the database implementation of current entity! The minimality of AXIOM 9 disables the possibility of partial dependence in the database implementation of current entity! Irredundancy and minimality of an existing nonkey domain means that R is already in 3NF otherwise the AXIOM 9 asks to designer arriving to such nonkey design qualification level performing the functional dependency analysis of current entity. If some nonkey attributes of R specs are discarded, such fact means that ERA specs were not in 3NF. In such a case, we must retrofit the data store specifications in the sense of re-specifying the current entity and create the corresponding new entities. Please, remember that descriptive foreign keys cannot be discarded from nonkey domain. COROLLARY 9.1 We don't want increase the number of non-key attributes or, worse, adding some "ForFutureUse" attribute names. AXIOM 9 is not applicable to entities without nonkey subset, i.e. a class entity, an associative entity or a denotative entity. COROLLARY 9.2 Key attributes cannot be discarded. Only the nonkey domain can be reduced in order to get a 3NF entity. And the reduction only can affect to immediate attributes (not to foreign keys). The reduction takes the form of creating new 3NF entities, e.g. i. A part key A, being now also a foreign key in R, i.e. A∊RA, plus its nonkey attribute(s) of R depending on A as RA (A, ...); ii. A part of nonkey B, now also promoted to foreign key, i.e. B∊RB, plus its nonkey attribute(s) of R depending on B as RB (B, ...).

eXtreme normalization repairs each design flaw of such 3NF table, for example, a 3NF R with many "occurrences of the special null" [1979Cod] (abbr. u-mark) which means "value inapplicable" (abbr. "i-mark") [1990Cod], with a non-empty set of irreducible 3NF tables without marks. The eleven chapters under eXtreme normalization umbrella are: • QUINE NORMAL FORM, • ONTOLOGIC NORMAL FORM, • INCLUSION NORMAL FORM, • UNMARKED NORMAL FORM, • LACONIC NORMAL FORM, • OCCAM NORMAL FORM, • UNION NORMAL FORM, • NATURAL NORMAL FORM.

20.1 Overview of eXtreme normalization A cross-reference between the 3NF design weakness and its eXtreme normalization follows. 3NF Schema design defects Orphan table Inclusion dependence i-marks, u-marks, d-value Ambiguous description Unnecesary associative entities Database horizontal partition Database vertical partition Hidden functional dependency

eXtreme normalization Ontologic normal form Inclusion normal form Unmarked normal form Laconic normal form Occam normal form Union normal form Natural normal form Schema normalization

20.1.1 Quine normal form specialty Quine normal form, besides being part of eXtreme normalization, has its own place as a real application of Quine's insightful ideas on designing directly polyadic 3NF relations of class A, even if optimal 3NF concept did not existed. Quine normal form has: (a) A defined entry in the form of the specifications of an ERA many-to-many relationship of more than two partner entities; (b) A defined target, i.e. a 3NF relation of class A —an RM/T associative table—, and (c) An step-by-step procedure transforming the ERA specs in a RM/T associative entity of any degree.

20.1.2 Schema normalization Hidden functional dependency is a 3NF Schema design defect related to a vertical database partition which is transparent to Natural normal form but not to Schema normalization which is part of DATABASE SCHEMA OPTIMIZATION report. Thence, the eXtreme normalization is a complement of Classic normalization that has come to stay but Database Schema Optimization is the only place for discovering (and repairing) Hidden functional dependencies (if any), a part of having its own topics and housekeeping tasks. Standard normalization certifies the 3NF status of the current 1NF table or refactoring it gives a set of 3NF tables. Extreme normalization recognise some reducible 3NF pattern in current R and improves it in the form of a set of irreducible 3NF tables (abbr. i3NF). Database schema optimization looks for Hidden functional dependencies and wisely repair them (if any). Only then, the whole set of tables of the original schema are in Optimal third normal form (abbr. o3NF).

20. ExTREME NORMALIZATION The next chapters are part of eXtreme normalization that mainly means refactoring a table R already in third normal form.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 51 of 76

The whole view can be appreciated in the following picture.

20.5 Thesis of eXtreme normalization The underlying thesis on all the proposals of eXtreme normalization is that some database schema whose tables are in third normal form may have design flaws, for example, too many marks of "inapplicable property" type in R or some vertical database partitions R¹ and R² of an ideal R, i.e. an enough important defect for considering that the schema itself is not in "Optimal third normal form" [1974Cod].

21. QUINE NORMAL FORM

Overview of Normalization

20.2 eXtreme normalization mental map Each eXtreme normal form shares the following context: [1] 3NF normalization has a clear design input in the form of two well-defined design defects: Partial dependence and Transitive dependence; [2] 3NF normalization also is a procedure —called by us ThirdNF method whose abbreviation is 'TNF proc'— detecting any number of the mentioned design flaws in a given R table, always taking advantage of Functional dependency analysis; [3] 3NF normalization procedure has a refactoring method of the current transitive FD sub-structures of R; and [4] Finally, 3NF normalization procedure has a sound semantical interpretation of any number of FD substructures of any given R table, giving a non-empty schema of 3NF relations. 20.3 eXtreme design credits If third normal form is so good guarantying "zero defect data" being itself "zero defect data design", we go following the Beck [1999Bec] eXtreme programming lemma "If something is good", we follow continuously 3NF design integrity rules, even on entities already in 3NF having some design defect but having another feasible 3NF design without the conspicuous design defect. Such type of final 3NF design is called irreducible [2000Dat] Third normal form (as already stated). 20.4 Unsurpassable third normal form Assuming the previous statements, we would admit that Third normal form is unsurpassable, that means —in affirmative terms— the following. An eXtreme normal form is an specialized variation of some "irreducible Third normal form" [2000Dat]. The last statement implies the following corollaries: i. There are reducible 3NF tables meriting a reduction; and ii. Any database schema having a reducible 3NF table is not optimal [1974Cod].

20.4.1 Some irreducible third normal forms This is the case of Ontologic normal form, Inclusion normal form, Unmarked normal form, Laconic normal form, Occam normal form, Union normal form, and Natural normal form, that is, all such "normal forms" are ad-hoc methods having in common two properties: #1. Given a non-empty set of 3NF tables, the method gives another non-empty set of irreducible 3NF tables without semantic loss; and #2. The entry unit to each eXtreme method is some reducible 3NF relation R susceptible of some data design enhancement.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

Quine Normal Form is the general method for representing a polyadic associative table of any degree in the format of a dyadic associative table. The method combines the Quine's polyadic relations analysis as a set of nested binary associations using —at left-hand side— the single primary key of a table and —at right-hand side— a surrogate of an associative table. This mechanism is reused recursively increasing progressively the information of the right-hand side associative table. The resulting irreducible 3NF associative relation is a general relational design implementing any M:N relationship among an unrestricted number of ERA entities without introducing any polyadic noise because done in the ordered formal way that Quine finally found in 1981 [1954Qui], [1981Qui]. The functional ordering of the associated tables —as it name suggest— will be better known by the data designer that by any other. Quine normal form report has the following sections: • Ordered pair; • Quine theory on polyadic relations; • 3NF tetradic example; • QNF polyadic analysis; • Hidden derivable association; • QNF associative method; • The selected binary seed; • Selecting the first referent; • Adding the last referent; • Checking QNF composition; and • Quine normal form of KACP table.

21.1 Ordered pair «Of the two members of a pair, a determinate one is generally the first, and the other the second; so that if the order is reversed, the pair is not considered as remaining the same» [1881Pei]. (a, b) is an unordered pair [1974Dea] and 〈a, b〉 is an ordered pair [1961Kur] & [1974Dea]; ― 〈a, b〉 is an element of U² [1910R&W]; ― a is called the referent subject [1910R&W]; ― b is the relatum subject [1910R&W]. "La classe ((a, b), (a)) est une paire ordonnée dont a est le premier élement et b est le second" [1921Kur]. 〈a, b〉 ≝ ((a, b), (a)); 〈x, y〉≝ {{x}, {x, y}}; The class 〈x, y〉 is an ordered pair whose first element is x and the second is y [1954Qui]. Definition 8: Ordered pair

2016-01-2412:51:18 Page 52 of 76

"Let us assume a fragment of set theory, adequate to assure the existence, for all x and y without regard to logical type, of the set {x, y} whose members are x and y, and to assure the distinctness of x from {x, y} and {{x}}. Let us construe in Kuratowski's fashion [1920Kur], the ordered pair 〈x, y〉 as {{x}, {x, y}}" [1954Qui] (see also [1961Kur]).

21.2 Quine theory on polyadic relations «The theory of dyadic relations provides a convenient basis for the treatment also polyadic cases, e.g. (y is between z and w) and (x pays y to z for w). A triadic relation among elements y, z, and w might be conceived as a dyadic relation borne by y to z;w. Tetradic relations could be handled on the basis of triadic ones in similar fashion. Similarly for pentadic relations, hexadic ones, and so on» [1981Qui]. Quine theory is already known by the reader (From binary to polyadic relations below) applied to the definition of maximal independent subset (miß). Quine normal form also is an application of binary relation Logic [1981Qui] but now for designing polyadic associative RM/T entities, i.e. useful and safe associative schemata of tables in irreducible 3NF of class A. This design technique construes polyadic relations adding the n-ary atomic referent to a meaningful and ordered polyadic relatum [1958Qui] represented by a surrogate key.

21.3 3NF tetradic example The tabulation KACP with four key attributes follows. Agent Company Product KACP City Cedar Rapids Brown TOYOTA bus Cedar Rapids Brown TOYOTA car Cedar Rapids Fisher CHRYSLER truck Cedar Rapids Fisher FORD truck Cedar Rapids Fisher GM truck Cedar Rapids Jones GM car Cedar Rapids Jones GM truck Cedar Rapids Smith FORD car Cedar Rapids Smith FORD truck Davenport Brown CHRYSLER car Davenport Brown FORD car Davenport Brown GM car Davenport Brown TOYOTA car Davenport Fisher FORD car Davenport Fisher FORD truck Davenport Jones GM car Davenport Jones GM truck Davenport Smith FORD car Davenport Smith FORD truck Des Moines Brown CHRYSLER car Des Moines Brown FORD car Des Moines Brown GM car Des Moines Brown TOYOTA car Des Moines Jones GM car Des Moines Jones GM truck Des Moines Smith FORD car Des Moines Smith FORD truck Des Moines Truman TOYOTA bus Des Moines Truman TOYOTA car Sioux City Brown CHRYSLER car Sioux City Brown FORD car Sioux City Brown GM car Sioux City Brown TOYOTA car Figure 6. 3NF tetradic example The tabulation has car companies making several types of products which are selling by agents. E. Villar THIRD NORMAL FORM FUNDAMENTALS

There are two different types of agents: Some of them is a company representative, i.e. "Truman represents FORD company" and other are selling a generic product of whatever company, for instance, "Brown sells cars". In a given city, each agent either represent all the products of a company or s/he is an agent specialized in a type of product of any company.

21.4 QNF polyadic analysis The enriched atomic projections of the original tabulation follow (Figure 7). (K)

City

(P)

(A)

Agent

Cedar Rapids

Brown

Davenport

Grey

Des Moines

Jones

Sioux City

Smith

Waterloo

Truma n

Product bus car truck van

Fisher

(C)

Company CHRYSLER FORD GM TOYOTA MERCEDES Figure 7. QNF enriched atomic projections

21.5 Hidden derivable association Having the original column projections of a triadic or tetradic associations allows checking if the entry tabulation is a subset of the Cartesian product of the values of the mentioned projection. In the other side, we can find a Cartesian product, i.e. a derivable n-ary association. In other words, we face a perfect SQL view but materialized permanently. If the hidden derivable association is not a flaw of the current tabulation then it must be returned to her/his owner in order to delete the associative table from the test directory and create the corresponding view. A normal preventive way of discarding Cartesian product in an associative tabulation is that of we have already performed with the column projections of Figure 5. QNF enriched atomic projections. It consist simply in adding a new entry to every column, for instance, the value 'Mercedes' in the last cell of Company column.

21.6 QNF associative method QNF is the name of a design procedure giving RM/T polyadic associative tables of impairing quality.

Figure 8. QNF tetradic example

2016-01-2412:51:18 Page 53 of 76

In Figure 8, let analyzing (D·T·L·P) as {D (D, …); T (T, …); L (L, …); P (P, …)}. Let R (L·P) be the selected seed of the future tetradic RM/T associative table, then R must be surrogated, i.e. R(x, (L·P)) will give x as the first referent, giving TLP (y, (T·x)). Finally, DTLP (D·T·L·P) will have the form DTLP (D, y). QNF generic method can be explained by the schema of Figure 6: #1. A SURROGATIVE category, e.g. (x, ⟨L·P⟩), represents the "seed" dyadic table, e.g. (L· P); #2. A SURROGATIVE category, e.g. (y, ⟨T·x⟩), represents a triadic table, e.g. (T·L·P); #3. A DYADIC category, e.g. (D· y), represents the TETRADIC category (D·T·L·P); #4. And so on. QNF is the practical support of AXIOM 8 of LEGO® DESIGN GUIDE that states "Only a single identifier can be part of the set of attributes of a referencing entity".

21.7 The selected binary seed Let select Company-Product as Quine's seed: It is the dyadic projection having a generally acceptable two-way meaning: "Company z makes product w" and "Product w is made by company z". Having Company-Product as seed, we add the corresponding surrogate. It is called CPk in our example (Figure 9). Product CP CPk Company Cc CHRYSLER car Cv CHRYSLER van Ct CHRYSLER truck Fc FORD car Ft FORD truck Gc GM car Gt GM truck Tb TOYOTA bus Tc TOYOTA car Figure 9. Quine's seed is CP table

21.8 Selecting the first referent The second Quine's decision is selecting the attribute best associated with the dyadic seed, i.e. the CP table. This attribute is already the third subject of the future tetradic associative table. It seems that Agent is a good idea for its association with both components of CP projection: "An agent represents a company (with all its products)" and "All the products of a company are represented by an agent"). Now, it is a matter of matching Agent and the surrogate of CompanyProduct, and assigning surrogates to the new table ACP (Figure 10). The list of agents corresponds to the set of the projection AgentCompany-Product of original tabulation.

The surrogated dyadic projection Agent-CPk represents the triad {Agent-Company-Product} (Figure 8 and Figure 11). ACPk Agent CPk BCc Brown Cc BFc Brown Fc BGc Brown Gc BTb Brown Tb BTc Brown Tc FCt Fisher Ct FFc Fisher Fc FFt Fisher Ft FGt Fisher Gt JGc Jones Gc JGt Jones Gt SFc Smith Fc SFt Smith Ft TTb Truman Tb TTc Truman Tc Figure 11. A dyadic table representing a triad ACP continues needing a surrogate key (ACPk) in order to have a future single reference in the following binary associative level. ACP

21.9 Adding the last referent K.City is the last referent of a pure dyadic association between its kernel table, i.e. K (City), and the relatum ACP.ACPk of ACP (ACPk, Agent, CPk) associative table (whose subject is a surrogate representing the already associated three subjects: A.Agent, C.Company and P.Product. KACP

City ACPk Davenport BCc Davenport BFc Davenport BGc Davenport BTc Des Moines BCc Des Moines BFc Des Moines BGc Des Moines BTc Sioux City BCc Sioux City BFc Sioux City BGc Sioux City BTc Davenport FFc Davenport FFt Cedar Rapids FFt Cedar Rapids FGt Davenport JGc Davenport JGt Des Moines JGc Des Moines JGt Cedar Rapids JGc Cedar Rapids JGt Davenport SFc Davenport SFt Des Moines SFc Des Moines SFt Cedar Rapids SFc Cedar Rapids SFt Des Moines TTb Des Moines TTc Cedar Rapids TTb Cedar Rapids TTc Figure 10. A tetradic association as a binary relation.

The forth and definite dyadic association in Quine normal form is KACP table. It has the same set of rows (and the same ordering) of the original {Agent, City, Company, Product} table. Figure 10. Construing the second surrogative table

E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 54 of 76

21.10

Checking QNF composition

Two simple natural joins upon Quine's associative hierarchy recover the rows of the original tabulation KACP: Select City, Agent, Company, Product FROM ((KACP NATURAL JOIN ACP) NATURAL JOIN CP) ORDER BY City, Agent, Company, Product; The following query report —defined with the previous SQL code— recovers as usual the original 3NF polyadic tabulation KACP.

22.2 Ontologic normal form A 3NF R orphan table of schema S leaves its orphan condition by a designer ontological reasoning in St. Anselm's fashion, i.e. nothing better "can be imagined" [1078Ans] than creating the corresponding entity classes and adding the corresponding references in the otherwise orphan entity. For instance, the orphan table: EMPLOYEE (EmpNo, DeptName... ) ... can be ontologically repaired just Creating the DEPT (DeptName) class. And horning then DeptName attribute of EMPLOYEE with "REFERENCES DEPT", i.e. EMPLOYEE (EmpNo, DeptName∊DEPT ...).

22.3 Orphan PK component The second application of Ontological normal form consists in repairing an existing attribute being a key component of an associative entity (or of a characteristic entity)but currently it is without home set. Having an associative primary key with some single component being not a foreign key is syntactically correct, but semantically inconsistent: A partner on an associative entity must exist as individual entity type before playing any role in an association. Besides, a single component of an associative primary key being not a foreign key is a "hang" component. For instance, the orphan associative table: Figure 12. KACP report on QNF tables

21.11

Quine normal form of KACP table

KACP table in Quine normal form is now the KACPS database schema of irreducible 3NF tables, and the SQL VIEW KACP_REPORT corresponding to the original tabulation.

21.11.1 KACPS database schema The SQL specs of the new KACPS schema in Quine normal form follows. KACPS: {K(City); A(Agent); C(Company); P(Product); CP (CPk, Company∊C, Product∊P); ACP (ACPk, Agent∊A, CPk∊CP); KACP(City∊K, ACPk∊ACP)}.

21.11.2 SQL VIEW of original tabulation The original set of rows can be easily gets using the following VIEW. CREATE VIEW KACP_REPORT (City, Agent, Company, Product) AS SELECT City, Agent, Company, Product FROM ((KACP NATURAL JOIN ACP) NATURAL JOIN CP);

22. ONTOLOGIC NORMAL FORM The report on the Ontologic normal form has the following sections: • Orphan entity; • Ontologic normal form; • Orphan PK component; and • Ontologic normal form (k).

22.1 Orphan entity Given an schema S of more than one table in 3NF, any unreferenced R is an "orphan" entity. Orphan entity is a flaw of schema S. Such weak defect can be easily repaired. A new class entity is ontological when its creation is the way of giving a home set to an existing attribute which is not foreign key. The existing attribute can be a descriptive attribute of the current entity or a primary key component.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

R

Agent Brown Brown Jones Jones Smith Smith

Company Toyota Toyota GM GM Ford Ford

Product bus car car truck car truck

22.4 Ontologic normal form (k) In this case, creating a new class and adding a foreign key to an existent primary key is part of the normalization added value chain. R can be ontologically repaired … … creating A (Agent), C (Company) and P (Product) RM/T classes. And also populated the classes A, B and C (but without converting R in a Cartesian product). Agent Company Product A C P Brown Ford bus Jones GM car Smith Toyota truck Fisher Chrysler van The last step is horning the corresponding PK components of R table with its references, i.e. Agent∊A Company∊C Product∊P R Brown Toyota bus ... ... ... Smith Ford Truck

22.5 Anselmian privilege A proactive normalizer can declare foreign key to some orphan attribute of R during the data review even if the entity is not orphan provoking then the creation of the corresponding referenced thesaurus as part of new 3NF schema of R.

2016-01-2412:51:18 Page 55 of 76

23. INCLUSION NORMAL FORM This report considers the case of two columns 〈x, y〉 related by the binary relation x⊊y, i.e. x properly includes y cases, and being both columns part of same relational schema but not of the same table. INCLUSION DEPENDENCY report has the following entries: • Inclusion dependency; • Inclusion dependence; • Inclusion dependence as ERA case; • Inclusion dependence as RM case; • Inclusion normal form.

23.1 Inclusion dependency "Column A is inclusion dependent on Column B. That is, the set of dbvalues in R.A is a subset of the db-values in R.B: R.A is-in R.B" [1990Cod]. The previous definition comes from the original definition of Casanova, Fagin and Papadimitriou [1983CFP] which is the general case: "In general, an inclusion dependency is of the form R [A₁, ... Aₓ] ⊊ S [B₁, ... Bₓ] where R and S are relation names (possibly the same), and where each Ai and Bi are attributes" [1983CFP].

23.2 Inclusion dependency vs. specialization "As an example, an inclusion dependency can say that every MANAGER entry of the R relation appears as an EMPLOYEE entry of the S relation" [1983CFP]. The specialization, e.g. "MANAGER IsA EMPLOYEE" is not a conspicuous example of inclusion dependency. Let consider that specialization, i.e. the IsA binary relation, has its own legality and, —except formally— specialization is not a case of Inclusion dependency. The RM meaning of IsA binary relation and a specialization schema [1990Cod] in irreducible 3NF [2000Dat] can be found in UNMARKED NORMAL FORM report.

23.3 Inclusion dependence Column B is inclusion dependent on Column A —in the same database schema—, if the set of values in B is a proper subset of the values in A, i.e. B⊊A. Our definition follows the laconism of Codd adopting the generalization that A and B can be columns of different relations and using the standard symbol of 'Proper Subset Of', i.e. 'B⊊A' [1983CFP]. With the new name, our definition wants underline that some inclusion dependencies —from the point of view of data design are not neutral even being in 3NF—, hence the name "inclusion dependence" instead of "inclusion dependency". In other words, Inclusion Dependence refers to a 3NF table design defect.

23.4 Inclusion dependence as ERA case The new resulting definition allows interpreting Inclusion dependency as an ERA one-to-one relationship case between entity R (A, ...) and entity S (B, ...) being R of greater cardinality than S. Consequently, R entity cases participate optionally in the R-S relationship and all S cases participate in the 1:1 relationship.

23.5 Inclusion dependence as RM case Inclusion normal form deals with an schema of two relations in 3NF, semantically independent but sharing the identifier domain and the name but with different descriptive properties. The different cardinalities of R and S allows: (a) Saving multiple occurrences of the NULL values in the included column in case of considering a molecule like RS (A, ... B∊RS, ... ); and (b) Saving the two-steps at table RS creation for having an autoreference.

23.5.1 Create 3NF inclusion schema

CREATE TABLE EMPLOYEE (EmpNo, Name, HireDate, Job, BirthDate, Salary); CREATE TABLE CHESS_RANGE (EmpNo∈EMPLOYEE, ChessRange) WHERE CHESS_RANGE.EmpNo⊊EMPLOYEE.EmpNo}; EMPLOYEE EmpNo Name HireDate Job Salary 015 David Brown 03-Mar-65 Designer $27,740 016 Bruce Adamson 15-Feb-71 Designer $25,280 018 Maria P Smith 06-Jul-72 Designer $21,340 019 Irving F Stern 14-Sep-72 Manager $32,250 022 James H Walker 26-Jul-74 Programmer $20,450 023 Christine Lopez 28-May-76 DW Designer $29,840 024 Ross G Lopez 28-May-76 DBA $29,840 026 William T Jones 15-Jun-76 Designer $24,680 036 Elizabeth Pianka 11-Oct-76 Programmer $22,250 045 John F Smith 15-Sep-77 Designer $24,680 059 David Brown 11-Apr-78 Operator $18,270 CHESS_RANGE EmpNo Chess ∈EMPLOYEE Range 018 760 019 500 026 500

23.6 Inclusion normal form Normalizing the inclusion dependence schema ID {EMPLOYEE; CHESS_RANGE} simply adds a proper primary key, e.g. PlayerNo, to the current included table; letting the including column and the table as it were before. In this way, the old primary key of CHESS_RANGE —that were shared with EMPLOYEE table—becomes a denotative foreign key representing the 1:1 ERA relationship with a supervened alternate key . CREATE SCHEMA STD { CREATE TABLE EMPLOYEE (EmpNo, Name, HireDate, Job, Salary); CREATE TABLE CHESS_RANGE (PlayerNo, ChessRange, EmpNo∈EMPLOYEE); The inclusion dependence documentation becomes unnecessary. And the tabulation of EMPLOYEE does not change. CHESS_RANGE tabulation has the new primary key column filled in. CHESS_RANGE Player Chess EmpNo No Range ∈EMPLOYEE 01 760 018 02 500 019 03 500 026 The coordinates of a datum of ChessRange have changed from 〈CHESS_RANGE·EmpNo(value)·ChessRange〉 to 〈CHESS_RANGE·PlayerNo(value)·ChessRange〉.

24. UNMARKED NORMAL FORM «The two most important types of null value have the meanings: • "value at present unknown or ω-mark"; and • "property inapplicable or i-mark"» [1979Cod]. UNMARKED NORMAL FORM report started with a 3NF relation with "many occurrences of the special null which means property inapplicable" [1979Cod] in order of reducing its number by the mean of creating an schema of 3NF tables of zero i-marks. Let a table that already was in 3NF being —after be free of any kind of marks— a table in "irreducible 3NF" (abbr. i3NF).

Let create a 3NF inclusion schema with the following pseudo-specs and the corresponding tabulations. CREATE SCHEMA ID { E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 56 of 76

UNMARKED NORMAL FORM report has the following sections: • Normalizing every mark; • Attribute is null; • Attribute is not null with default; • Unmarked policy; • Create molecule; and • Unmarked normal form.

24.1 Normalizing every mark specs The target now is normalizing every design mark getting a 3NF schema in which every column IS NOT NULL without semantic loss. Every column is not null implies: [1] Zero i-marks in the production database column; [2] Zero ω-marks in the database colums and SQL program with better RAS; [3] Zero ␢-marks in production; coming from "Zero IS NOT NULL WITH DEFAULT" at table design.

24.2 Attribute is null Why should we separate an attribute of a 3NF table into a subtable in Unmarked normal form? We do this because: o Missing values impair FD analysis; o Third normal form apply to value columns; and o Holistic data design assumes value columns.

24.2.1 Missing value impairs FD analysis Database schema integrity is at care of the target "Zero defect data" [1991Han] being "Zero defect data design" its laconic methodology. Until now, we have mentioned only two design defects: o Partial dependence; and o Transitive dependence. However, they only can be detected under the basis of the Dependency analisys. In its turn, functional dependency compares two values according the Principia Mathematica [1910R&W] Identity rule, i.e. without "missing values" [1990Cod] (aka "missing marks" [1990Cod] or SQL "IS NULL" specs [1992SQL]).

24.2.2 Third normal form apply to value columns "The concepts and rules of functional dependency and multivalued dependency were developed without considering missing db-values" [1990Cod].

24.2.3 Holistic data design assumes value columns Partial dependence and Transitive dependence defects also are circunvected with the help of five proven Third normal patterns for Class entity, Kernel entity, Characteristic entity, Associative Entity and Denotative entity but always under the assumption that db-values are without NULL marks (and without default values in the CK components) in order that any dependency daemon in production database can control that "all columns that are not part of the primary key are functionally dependent on the primary key" [1990Cod].

24.3 Attribute is not null with default "We note in passing that such a format also leads to ambiguities regarding the meanings of blank fields. A blank SKILL could mean the person has no skill, that the field is not applicable to this employee, that the data is unknown, or, as in this case, that the data may be found in another record" [1983Ken]. Using a default value allows the correct behavior of functional dependency of the designed tabulations and at the test and production environments. Besides, using a default value circunvects the poor MAYBE Logic inherent to coding the IS-NULL predicate of SQL queries. However —from the point of view of "Zero defect data design"—, this data design practice can be enhanced reducing its exagerate number of default values without information —as stated by the cite of Kent heading the section.

24.4 Unmarked normal form in a nutshell Instead of representing missing information by a database mark, let database missing information, missing (i.e. outside of the database until such information came from reality). Instead of representing volatil and optional information by a default value, insert its row/change the value/delete its row as usual in a dedicated table. In this way, all three marks (i-marks, ωmarks, ␢-marks) will vanish from the database. Enter into details, code " IS NOT NULL" every time, and apply to each relation with some column being candidate to "IS NOT NULL WITH DEFAULT" as it is were part of a molecule with many i-marks in order to reducing the marks of original column to cero just with: a dedicated subtable; and with a IS NOT NULL.

24.4.1 Every SQL column IS NOT NULL

Therefore, "all the normal forms based on these dependencies were also developed without considering missing db-values” [1990Cod].

"Every SQL column IS NOT NULL" follows the track of the paper "How To Handle Missing Information Without Using NULL" by Hugh Darwen, in, Warwick University (27 September 2006).

“It should be clear that, because nulls (or, as they are now called, marks) are not database values, the rules of functional and multi-valued dependency do not apply to them. Instead, they apply to all unmarked db-values” [1990Cod].

Let create a 3NF molecule with the following pseudo-specs and the corresponding tabulation. CREATE MOLECULE EMPLOYEE { (EmpId, Name, BirthDate, Job, Degree, TypingSpeed, InHolydays IS NOT NULL WITH DEFAULT '␢'), CASE Job {VALUE 'engineer' RECORDS Degree VALUE 'secretary' RECORDS TypingSpeed}}; Please, assume that CASE option, a part of specifying each subtype daemon inside EMPLOYEE table, it is also defining each specialized attribute as: ... IS NOT NULL WITH DEFAULT '␢'

The normalization concepts only apply to a database in which rows contain IS-NOT-NULL values in the columns (not missing values). Besides, the normalization concepts only apply to a database relation which rows contain not missing values and nor default values in the candidate key component column(s).

E. Villar THIRD NORMAL FORM FUNDAMENTALS

24.5 Create molecule

2016-01-2412:51:18 Page 57 of 76

24.5.1 EMPLOYEE tabulation

Figure 13. Molecule EMPLOYEE

24.5.2 Many i-marks per column Why should we separate an attribute of a 3NF table into a subtable in Unmarked normal form? We do this when the attribute is part of a IsA hierarchy recording also specialized facts, e.g. TypingSpeed [IS NULL], such marked attribute only is active when the 'Secretary' value occuppies the cell of attribute Job.

24.5.3 Many default values per column We do separate an attribute of a 3NF table into a subtable in Unmarked normal form when the attribute records optional facts marked by the attribute "IS NOT NULL WITH DEFAULT", e.g. InHolydays.

24.6 Unmarked normal form Normalizing a molecule, e.g. EMPLOYEE table, into a schema of "irreducible 3NF" [2000Dat] tables, factorizes the original table into: A base table with the already "IS NOT NULL" attributes, e.g. EMPLOYEE; A family of specialized subtables (if any), e.g. ENGINEER and SECRETARY; and A set of tables with "IS NOT NULL" columns and the following layout: INN (INNId, OldPK∊BASE, OldDefaultCol) with a variable number of informed rows (if any). For instance, with the following pseudo-code. CREATE TABLE EMPLOYEE (EmpId, Name, HireDate, Job, BirthDate); CREATE SCHEMA E {EMPLOYEE ALIAS EMP}; CREATE TABLE E.ENGINEER (EN#, Degree, EmpId∊E.EMP); CREATE TABLE E.SECRETARY (SE#, TypingSpeed, EmpId∊E.EMP); CREATE TABLE E.IN__HOLYDAYS (IH#, InHolydays, EmpId∊E.EMP); Each subtable has its own primary key. Reusing the primary key of the original 3NF table, e.g. EMPLOYEE, were uncompliant with the Unmarked normal form that does not want be confused with a logical vertical partition (please, see: NATURAL NORMAL FORM). EmpId represents a reference in ENGINEER, SECRETARY and IN__HOLYDAYS. EmpId reference is not M:1 (descriptive) but 1:1 (denotative). The denotative reference allows that a subtype could "inherit" its generic attributes. The denotative reference, i.e. EmpId∊E.EMP, also is a supervened alternate key in the table being a denotative reference. However, it is not documented according Codd policy of letting AK's vanish.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

EMPLOYEE E# Name 01 Roy R Alonzo 10 Bruce Adamson 12 Grez Orlando 14 Kim N Natz 18 Maria P Smith 19 James H Walker 20 David Brown 22 Ross G Lopez 25 Daniel S Smith 26 Sybil P Johnson 27 Maria P Smith 30 Sally A Kwan 33 Wing Lee 50 John B Geyer 60 Irving F Stern 70 Eva D Pulaski ENGINEER EN# E#∊EMP 1 12 2 14 3 22 4 33 5 70

HireDate 08-May-62 06-May-63 12-Nov-63 12-Nov-63 05-Dec-63 11-Sep-64 16-Jun-65 26-Feb-68 05-Jan-71 03-May-71 05-May-71 06-Jul-72 14-Jul-75 28-May-76 28-May-76 15-Jun-76

Job fieldrep trucker engineer engineer fieldrep designer trucker engineer trucker trucker secretary manager engineer clerk operator engineer

BirthDate 01-Nov-29 17-May-47 18-Oct-42 19-Dec-46 21-Feb-49 25-Jun-52 29-May-41 19-Mar-54 12-Nov-39 05-Oct-36 26-Aug-54 11-May-44 18-Jul-41 15-Apr-30 07-Jul-45 15-Aug-48

Degree bachelor master mba associate master SE# 1 2 3 4 5

SECRETARY E#∊EMP TypingSpeed 465 10 500 12 540 25 700 26 27 700

IN_HOLYDAYS IH# E#∊EMP InHolydays 26-03/04-04-72 10 10 26-03/31-03-72 70 70 This family of the new EMPLOYEE table and its three new subtables {ENGINEER, SECRETARY, IN_HOLYDAYS} is a canonical schema of irreducible 3NF tables with business information only, under the lemma that follows. "Every attribute IS NOT NULL" IMPLIES {"Zero attribute IS NULL" & "Zero attribute IS NOT NULL WITH DEFAULT"}. Unmarked normal form reduces to zero the inapplicable property marks (i-marks) of the cells being not EMPLOYEE.Job ('engineer') and being not EMPLOYEE.JOB ('secretary'). As expected, Unmarked normal form also reduces to zero the columns with '␢-marks. of the marked tables, e.g. the old EMPLOYEE.InHolydays column. In all the cases, all the columns of the new tables are "IS NOT NULL" because now: The cells of columns without database information are not part of the database.

25. LACONIC NORMAL FORM Laconic normal form deals with a 3NF relation with one or more "embedded" one-to-one relationships, for instance a molecule with EMPLOYEE data with optional unitary documents. The design problem here is not the "occurrences of the special null which means value inapplicable" [1979Cod] but the semantic assignation of descriptive attributes to each of "original" entities just knowing the involved primary keys.

2016-01-2412:51:18 Page 58 of 76

In this molecular case, the i-marks will help indirectly in the mentioned semantic objective of correlating each descriptive attribute with its own entity identifier. The result will be a clean schema of n 3NF entities with (n−1) one-toone ERA relationships. This report has two sections: • Create ambiguous molecule; and • Laconic normal form;

25.1 Create ambiguous molecule Let create a 3NF molecule with the following pseudo-specs and the corresponding tabulation. CREATE TABLE EMPLOYEE (EmpNo, SSN, DriverLic, Name, HireDate, Job, Since, State); Please, assume that SSN, DriverLic, Since and State attributes are: ... IS NOT NULL WITH DEFAULT ' '

25.1.1 Ambiguous tabulation

25.2 Laconic normal form One molecule of n candidate keys has a primary key and (n−1) alternate keys. We also assume that each alternate key has at least one descriptive attribute, i.e. one of "IS NOT NULL WITH DEFAULT" attributes has a property of an specific alternate key. Let us call, LACONIC family to the set of n tables. In this metaphoric way, "SPARTAN" would be the name of the future table that will host the current primary key but none of attributes being "NOT NULL WITH DEFAULT". "HELOT" is the generic name of any of the other tables of this family. Each HELOT category is defined by the set of x descriptive columns that if {IF (column R.x has a ω-mark) THEN (CK R.y has a ω-mark in the same row) END-IF} holds in R, then (R.y is a helot PK and x is a non-key attribute of RY). In other words, in the same ideal table RY where y is the primary key also x is an attribute having a value in every row, and such value is a property sorting y. Resuming, we have discovered that y↣x holds in R promoting the existence of a new 3NF called RY. The set of descriptive attributes unassigned to HELOT tables, is the descriptive attribute set of the SPARTAN table that will hold the original primary key. In the other side, The descriptive attributes of SPARTAN table also are the common attributes of all HELOT tables. The own Spartan's descriptive attribute set cannot be empty. In other words, some common attribute must exist to fulfill the expectations of being a Laconic family. Finally, the rows of Spartan table and those of Helot tables are without i-marks.

25.2.1 Laconic family The current schema of this Laconic family is the following: E: {EMPLOYEE; SS; LIC} whose specifications are the following: E.EMPLOYEE (EmpNo, Name, HireDate, Job); E.SS (SSN, Since, EmpNo∈EMPLOYEE); and E.LIC (DriverLic, State, EmpNo∈EMPLOYEE)}.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

The SPARTAN category is EMPLOYEE that follows. EMPLOYEE EmpNo Name HireDate Job 02 David Brown 03-Mar-65 Designer 05 Bruce Adamson 15-Feb-71 Designer 08 Maria P Smith 06-Jul-72 Designer 10 Irving F Stern 14-Sep-72 Manager 19 James H Walker 26-Jul-74 Designer 21 Christine G Lopez 28-May-76 Designer 22 Ross G Lopez 28-May-76 Designer 23 William T Jones 15-Jun-76 Designer 26 Elizabeth R Pianka 11-Oct-76 Designer 37 John F Smith 15-Sep-77 Designer 42 David Brown 11-Apr-78 Designer The HELOT instances, i.e. SS and LIC, follow. SS SSN Since EmpNo∈EMPLOYEE 575-14-0109 12-Sep-59 02 575-14-0119 18-Jun-64 05 575-14-0110 14-Jan-66 08 575-14-0129 12-Oct-61 10 575-14-0190 11-Jul-69 19 575-14-2119 02-Apr-71 21 575-14-0108 02-Apr-71 22 575-14-0107 15-Dec-68 23 575-14-0106 30-Apr-71 26 575-14-0169 06-Nov-70 37 LIC DriverLic State EmpNo∈EMPLOYEE F255-9215-0121-03 CA 02 F255-1515-0121-00 CA 05 F255-1592-0121-00 CA 08 F255-1925-0121-00 CA 10 F255-1921-0121-00 CA 21 F255-9215-0121-00 CA 22 F255-9925-0121-00 CA 23 F255-9615-0121-00 CA 26 F255-9882-0121-00 CA 42

26. OCCAM NORMAL FORM Occam normal form deals with a 3NF schema in which every one-toone ERA relationships has been designed as an standalone denotative RM/T entity. For instance, the previous SPARTAN schema {EMPLOYEE, SS, LIC} but augmented with the tables {EM_SS, EM_LIC}. A Plural schema simply can be designed without the pure denotative entities. And clearing such entities is the Occam's razor business. The result will be a clean schema of n 3NF entities with (n−1) one-toone ERA relationships but without the standalone associative tables {EM_SS, EM_LIC}. This report has three sections: • Create plural schema; • Occam’s razor; and • Occam normal form;

26.1 Create plural schema Let create a 3NF schema with the following pseudo-specs and the corresponding tabulation. CREATE SCHEMA PLURAL { EMPLOYEE (EmpNo, Name, HireDate, Job); SS (SSN, Since); LIC (DriverLic, State)}. CREATE PLURAL.EM_SS (SSN∊SS; EmpNo∊EMPLOYEE); CREATE PLURAL.EM_LIC (DriverLic∊LIC, EmpNo∊EMPLOYEE);

2016-01-2412:51:18 Page 59 of 76

26.1.1 Plural schema tabulations

26.2 Occam’s razor

EMPLOYEE EmpNo Name HireDate Job 02 David Brown 03-Mar-65 Designer 05 Bruce Adamson 15-Feb-71 Designer 08 Maria P Smith 06-Jul-72 Designer 10 Irving F Stern 14-Sep-72 Manager 19 James H Walker 26-Jul-74 Designer 21 Christine G Lopez 28-May-76 Designer 22 Ross G Lopez 28-May-76 Designer 23 William T Jones 15-Jun-76 Designer 26 Elizabeth R Pianka 11-Oct-76 Designer 37 John F Smith 15-Sep-77 Designer 42 David Brown 11-Apr-78 Designer SS SSN Since 575-14-0109 12-Sep-59 575-14-0119 18-Jun-64 575-14-0110 14-Jan-66 575-14-0129 12-Oct-61 575-14-0190 11-Jul-69 575-14-2119 02-Apr-71 575-14-0108 02-Apr-71 575-14-0107 15-Dec-68 575-14-0106 30-Apr-71 575-14-0169 06-Nov-70 LIC DriverLic State F255-9215-0121-03 CA F255-1515-0121-00 CA F255-1592-0121-00 CA F255-1925-0121-00 CA F255-1921-0121-00 CA F255-9215-0121-00 CA F255-9925-0121-00 CA F255-9615-0121-00 CA F255-9882-0121-00 CA EMP_SS SSN∊SS EmpNo∊EMPLOYEE 575-14-0109 02 575-14-0119 05 575-14-0110 08 575-14-0129 10 575-14-0190 19 575-14-2119 21 575-14-0108 22 575-14-0107 23 575-14-0106 26 575-14-0169 37 EMP_LIC DriverLic∊LIC EmpNo∊EMPLOYEE F255-9215-0121-03 02 F255-1515-0121-00 05 F255-1592-0121-00 08 F255-1925-0121-00 10 F255-1921-0121-00 21 F255-9215-0121-00 22 F255-9925-0121-00 23 F255-9615-0121-00 26 F255-9882-0121-00 42

"Occam’s razor (sometimes Ockham’s) is named after the English philosopher and Franciscan friar Father William of Occam (c.1288c.1348), who wrote “Numquam ponenda est pluralitas sine necessitate” (plurality must never be posited without necessity), and “Frustra fit per plura quod potest fieri per pauciora” (it is futile to do with more what can be done with less)" [2015Bus].

E. Villar THIRD NORMAL FORM FUNDAMENTALS

26.3 Occam normal form Let create a 3NF schema in Occam normal form with the following pseudo-specs. CREATE SCHEMA OCCAM { EMPLOYEE (EmpNo, Name, HireDate, Job); SS (SSN, Since, EmpNo∊EMPLOYEE); LIC (DriverLic, State, EmpNo∊EMPLOYEE)}. EMPLOYEE table does not change but SS and LIC tables capture an ERA one-to-one relationship implementation each, i.e. a new column with the primary key of EMPLOYEE as foreign key and as supervened alternate key.

26.3.1 Occam schema tabulations EMPLOYEE, SS and LIC tabulations in Occam normal form correspond to the already known by the reader as result of Laconic normal form.

27. UNION NORMAL FORM An horizontally partitioned database table 3NF R is simply a database horizontal partition of the rows of the ideal R table in several tables, i.e. R¹, R², etc. The partition of the rows ought to be according some distribution criteria, i.e. each row of R only exists in one of partitions of R. A distributed schema is not optimal because it can reach the functional specifications of ideal R just with one table. The report on Union normal form has the following sections: • Horizontal database partition; • Union normal form; and • Transparent distribution.

27.1 Horizontally partitioned database table An horizontally partitioned database table aka a database horizontal distribution of the rows of the ideal R table in several tables, i,e. R¹, R², etc. Each database horizontal partition is a different relvar and a different set of rows but with the same set of attributes and the same dependency structure for all of them. And it has serious design flaws. First at all, there is a conceptual confusion between the table specifications which is part of Logical design and the distribution of data rows, which is a matter of the Distribution transparency facility [1985Cod] of the r-DBMS. A second problem is that the number of DML sentences in the programs increases linearly with the number of tables being part of the distributed schema. Another operational concern is that a query along all united tables will have a worse response time than a common table query. Really, we have good reasons to investigate if our inspected schema has some 3NF tables with the same set of attributes under different table names.

27.2 Union normal form In the other hand, if we found some horizontal distribution of R rows, solving it during its development, is an easy task: 1) Select a new name for the resulting undistributed table, the ideal R, and preserve all the old names; 2) Discard the specifications of distributed tables (except one); 3) Rename the future undistributed 3NF table; 4) Find the referencing tables (if any) of all the distributed tables in the DBMS directory, and change the old tables names in each foreign key by the new name; 2016-01-2412:51:18 Page 60 of 76

5) The distributed tabulations suffer the same fate as their tables although the rows of deleted tables must be united in the new tabulation. The resulting base table R is now in irreducible 3NF with its attribute intension and its external references ready. R tabulation (if any) will have all the rows of the distributed tabulations.

27.3 Transparent distribution In the past, designing a distributed schema was related to the fear of reaching the DBMS limit on the number of rows of a base table. Probably, it is the only reasonable cause to undertake a project so ungrateful but the current r-DBMS limits do improbable to find such a design challenge. If distribution of rows of R were needed, the current r-DBMS offers transparent row distribution among the company servers [1985Cod] and under the care of a distributed query optimizer. This is part of Physical design tasks and all will be at care of DBA and system programmers. All according the beauty Codd rule on Distribution transparency: "We should be able to split the data on the r-DBMS out onto multiple physical systems without the user realizing it" [1985Cod].

28. NATURAL NORMAL FORM A database schema with some ideal ideal table R vertically partitioned increases unnecessarily the number of tables of the schema disqualifying our schema for being optimal [1974Cod]. Besides, an specific vertical partition also is the only way of camouflaging a transitive dependence across several tables, i,e. R¹, R², etc., generating a "crossing redundancy" in the Rⁱ owning the dependent attribute which is transparent for the standard normalization procedure. This is so for being the determinant x in Rx and the dependent attribute in Ry and because normalization operates on the basis that input information comes from a single R table. The interest of Natural normal form is solving both defects in an economical way. The report NATURAL NORMAL FORM explains the mentioned defects and the details of the solution in the next sections: • Vertically partitioned database; • Cammuflated vertical partition; • Natural normal form; • Natural normal form (Phase 1); • Natural normal form (Phase 2); and • Transparent vertical partition.

28.1 Vertical database partition A table Vⁱ is a vertical database partition of an "ideal" base table R if Vⁱ has the same primary key, the same cardinality of the ideal R table but only a disjoint part of the full nonkey attribute subset of R. Besides, the natural join of all the vertical partitions recover directly the primary key, the ideal nonkey domain and all the ideal rows. Given an ideal base table R represented by a family of vertical partitions, each partition Vⁱ has a disjoint set of descriptive attributes but the same primary key. The number of partitions used to be very short, for instance, two partitions as {CUSTOMER (AccountNr, Name, StreetAddress, City, ZipCode); BALANCE (AccountNr, CurrentBalance)}.

28.2 Natural normal form Natural normal form deals with Database vertical partitions two by two gluing them with a natural join. Besides, it takes advantage of the idea that some functional dependencies can be hidden in the joined R. Therefore, after doing the job of gluing the ideal table R, it sends the table to the stack of the standard table review, which will perform the dependency analysis and the standard normalization. E. Villar THIRD NORMAL FORM FUNDAMENTALS

28.3 Natural normal form (Phase 1) Every vertical partition Vⁱ should be investigated, detected and solved with the natural join of all the rows giving a set of same cardinality of Vⁱ but with an attribute intension of the union of the nonkey intensions and the "shared" primary key, e.g. CUSTOMER (AccountNr, Name, StreetAddress, City, ZipCode, CurrentBalance). Unhappily, the natural join of two irreducible 3NF tables (3NF × 3NF) with the same primary key is a base table whose empirical dependency structure is unknown. In other words, the resulting table of the first innocent natural join between two 3NF tables is no so innocent: Any natural join of two tables in 3NF can hide some transitive dependence [1974Cod]. That means that for every joined table coming from a set of vertical partitions, you must restart the inspection procedure with this table in order to analyzing its current dependency structure, setting up a Tree-with-one-source digraph, refactoring the FD structure, etc.

28.4 Natural normal form (Phase 2) Any joined (3NF × 3NF) table is in 1NF because a crossing dependency analysis is pending. Normalizing such table is performed feeding back the standard normalization procedure with a table in Natural normal form as depicted in Figure 12.

Figure 14. (3NF × 3NF) Feedback

28.5 Transparent vertical partition Being R table of a lot of attributes, a transparent vertical partition behavior can be an unavoidable economy, for example, having an SQL view with the more useful attribute subset of R. A transparent vertical partition behavior can be reached specifying its data VIEW and materializing it.

29. DATABASE OPTIMIZATION Optimizing a relational Schema whose tables are alredy in third normal form means guaranteeing that all its 3NF relations are irreducible [2000Dat] then mathematically the number of irreducible 3NF relations is minimal [1974Cod] which is the main Codd's condition. However, schema optimization starts (1) having relation dedicated primary keys. AXIOM 4 (reminder) No two entities in the same schema can have the same identifier name or the same active domain. A dedicated primary key opens the opportunity of (2) having a dedicated Descriptive Attribute Subset (abbr. DAß) per entity along the schema.

2016-01-2412:51:18 Page 61 of 76

(3) An unique DAß per entity implies: (4) an schema free of horizontal distribution and also (5) an schema free of vertical partition design defects which are the common ways of incrementing unnecessarily the number of irreducible 3NF tables of a given database schema. The optimization of a given database schema also has the qualitative consequence —discovered by E. F. Codd (as expected)— of spotting and refactoring the (so called) hidden functional dependencies. DATABASE SCHEMA OPTIMIZATION report has the following sections: • Integrating the 3NF micro-schemas; • Different types of database redundancy; • Hidden redundancy; • Hidden functional dependency; • Vertical database partition; • Optimal database schema; • Optimal third normal form; and • Optimal 3NF schema.

29.1 Integrating the 3NF micro-schemas After normalizing all the tables, the author has an irreducible 3NF table set then a new task starts which is recovering the cohesion in a new schema integrating the old and the new 3NF tables. This must be done integrating each 3NF micro-schema at a time taking advantage of the conections of the original schema.

29.2 Different types of database redundancy

Besides, Weak redundancy continues being a pedagogical workbench for introducing normalization in the shops with opposite positions in order to go ahead step by step.

29.3 Hidden redundancy Hidden redundancy —as depicted in Figure 13— is a schema S defect coming from some hidden dependence that would be a functional dependency if its participating attributes were all in the same ideal R table. But the determinant causing this dependence is in the table R¹ and the dependent is in the table R².

29.4 Hidden functional dependency Let call Hidden functional dependency to the FD found within a non-loss joins of two 3NF R¹ and R² being both a vertical partition of an ideal R table, i.e.: (a) R¹ and R² have the same primary key domain; but (b) The descriptive attribute subsets of R¹ and R² are disjoint. Partial and transitive dependences can be hidden but, obviously, for having a hidden partial dependence, the common PK of R¹ and R² should be composite.

29.5 Normal stepwise refinement Natural normal form only is "gluing" two vertical partitions of R. Discovering hidden dependencies is performed restarting normalization with a naturally joined table in Natural normal form. Actual checking/repairing of a hidden dependency continues being the business of classic normalization. This normalization loop is a case of stepwise refinement [1971Wir] of an inspected ideal R table.

29.6 Optimal database schema According Codd [1974Cod], Optimal schema definition takes the following four steps —(associated with some assumptions): (1) "Now, even though each relation in a collection of relations may be in third normal form, it does not follow that the collection itself is in optimal third normal form" [1974Cod]. (2) "For example, consider the collection VP consisting of two relations {R (A, B); S (A, C)}, where in each case the primary key is underlined as usual" [1974Cod]. (Let us "suppose R is non-loss joinable, with S on A" [1974Cod].) (3) "The join T of R with S on A, i.e. T (A, B, C) ≔ R (A, B) [R.A = S.A] S(A, C); clearly possesses the functional dependencies of B on A, i.e. A→B, and C on A, i.e. A→C, but it might also possess the dependency of C on B, i.e. B→C" [1974Cod]. ("Let us suppose T does have B→C dependency" [1974Cod].) (4) "Then, the more optimal collection STD: {T (B, C); R (A, B∊T)}, —consisting of the projection T (B, C) together with the relation R— replaces the VP collection of relations" [1974Cod].

29.7 Optimal third normal form Figure 15. Types of database redundancy

29.2.1 Strong redundancy of R Strong redundancy [1979Cod] —as depicted in Figure 13— involves all the data redundancy associated with partial or transitive dependence (or both), i.e. data lost, insertion delays and saga of changes.

29.2.2 Weak redundancy of R Weak redundancy —as depicted in Figure 13— refers to partial or transitive dependence (or both) in R but having an extra table WR holding the value of the attributes under the effects of a data lost at delete anomaly, i.e. Strong redundancy + an extra table − information lost ————————— Weak redundancy E. Villar THIRD NORMAL FORM FUNDAMENTALS

The previous definition "shows the need to consider not only the functional dependencies within a given relation but also the dependencies within all the non-loss joins of these 3NF relations, when attempting to cast a given collection in optimal third normal form" [1974Cod] (abbr. o3NF).

29.8 Optimal 3NF schema An optimal schema displays the full functionality of its data stores [1979Y&C] with a minimal number of irreducible 3NF tables. A database schema with "every entity in 3NF and with its own DAß" or, in other words, if every descriptive set is unique along the current schema STD, probably, it is a proof of having an optimal schema in the strong sense of Codd of 1974, i.e. with the highest normalization label, i.e. all the tables "in optimal third normal form" [1974Cod]. and a minimal number of irreducible 3NF tables.

2016-01-2412:51:18 Page 62 of 76

29.8.1 Free of strong redundancy A database schema of complete and irreducible 3NF tables is free of strong redundancy [1979Cod] by definition.

29.8.2 Free of weak redundancy A minimal database schema also includes being free of weak redundancy [1979Cod] because weak redundancy includes an extra table, a part of some effects of the strong redundancy.

29.8.3 Free of hidden redundancy A schema with unique DAßes is free of hidden redundancy because it is free of hidden functional dependencies. And it is free of hidden functional dependencies because it is free of database vertical partitions which is the unique cause —in a schema of irreducible 3NF entities— of having some hidden functional dependency.

30. ACKNOWLEDGMENTS Our thanks to William W. Armstrong [1974Arm] for its database structure concept. Special thanks for the slogan: "3NF in FD terms, please"! We always will remember the year 2003 for his understanding that partial dependence and transitive dependence were independent design defects. His influence in our Logic soul is ineffaceable. For their invaluable company and loyalty, come hell or high water, all our thanks go to our old chaps Mauricio del Castillo and Vicky Gómez. Our thanks go to our first three readers: Pedro Lucía, Darío Del Saz and Carl Ehrig-Eggert (friends of the Harmonia & Software pub). Our thanks to the assomptionniste sister Isabel Galbe (Chaparral NM USA) because her "oceanic" support arrived just-in-time. Our thanks to the following Villar people: Pablito, Quique, Ariadna, Nicolas, Ara, Paco, Carolina, Laura and Blanca. They are (and always will be) my minor Avengers. Our thanks to Laura Chaparro for being true that the writer have a daughter being already PD math and give me the photocopy of Codd paper of 1979 having set division and binary relation set transitive reduction. [19xxPra]Pradhan, S. S. & others. (). "OntoNotes: A Unified Relational Semantic Representation", Browse Conference Publications > Semantic Computing, 2007, ICS.

[8]

[9] [10] [11] [12] [13] [14] [15] [16]

[17] [18] [19] [20] [21] [22] [23] [24]

31. REFERENCES [1]

[1980A&L]Adiba & Lindsay. (1980). "Database Snapshots", Proc. Sixth Conference on VLDB (Montreal, Quebec). [2] [1993A&T]Akutsu & Takasu. (1993). "Inferring Approximate Functional Dependencies from Example Data", Knowledge Discovery in Databases Workshop, AAAI-93. [3] [1998Amb]Ambler, A. W. (1998). Building object applications that work, Cambridge University Press, New York. [4] [1078Ans]Anselm, St. (1078). "PROSLOGIUM: Ontological argument", Jonathan Barnes (translator), St. Anselm College, Manchester, USA-NH. (Available in the net following Wikipedia sources in 2015). [5] [1992SQL]ANSI X3.135-1992. (1992). Database Language SQL, ANSI ed., New York. [6] [1974Arm]Armstrong, W. W. (1974). "Dependency Structures of Data Base Relationships" in Proc. IFIP Congress, North-Holland ed., Amsterdam& others. [7] [1976Ast]Astrahan & others. (1976). "System R: A relational approach to data management". ACM Trans. on Database Systems. E. Villar THIRD NORMAL FORM FUNDAMENTALS

[25]

[26] [27] [28] [29] [30]

[1959Bac]Backus, J. W. (1959). "The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference", Inter. Conf. Info. Proces. (UNESCO), Paris. [2011B&E]Bagui & Earp. (2011). Database Design Using Entity-Relationship Diagrams, Foundations of Database Design, Auerbach Pub. [1990BCN]Batini, Ceri, Navathe. (1992). Conceptual Database Design, Benjamin/Cummings Pub. Co., Redwood City USA-CA. [1976Ber]Bernstein, P. A. (1976). "Synthesizing third normal form relations from functional dependencies", ACM Trans. Database Syst. 1, 4. [1994Bro]Brody, M. (1994). "Phrase structure and dependence", Ms., University College London and Hungarian Academy of Sciences. [1972B&S]Bronshtein & Semendayev. (1972). A Guide Book to Mathematics, Springer ed. [2015Bus]Busemeyer & others. (2015). The Oxford Handbook of Computational and Mathematical Psychology, Oxford University Press. [1988Cal]Calasso, R. (1988 (1st ed.)). The Marriage of Cadmus and Harmony, Random House, 1994. [1937Car]Carnap, R. (1937). The Logical Syntax of Language, Routledge &Kegan Paul ed., London (cited by) Burdick, H. (1974). "On a Syntactical Characterization of Logical Expressions", Notre Dame Journal of Formal Logic Vol. XV, 3. [1997Cat]Cattell & others. (1997) The Object Database Standard: OMDG 2.0, Morgan Kaufmann CA. [2015CGI] CGI. (2015). "Comp 204: Computer Systems and Their Implementation" —Lecture 20: Parsing & Syntax— (available in the net as "204-Lecture20.pdf"). [1976Cha]Chamberlin, D. D. (1976). "Relational Data-Base Management Systems", Computing Surveys, Vol. 8. [1981Cha]Chamberlin & others. (1981). "A History and Evaluation of System R", Comm. ACM, Vol. 24, 10. [1976Che]Chen, P.P. (1976). "The Entity-Relationship Model: Towards a unified view of data", ACM Trans. on Database Systems, vol. 1, No 1. [1957Cho]Chomsky, N. (1957 (1st ed.)). Syntactic Structures, Mouton de Gruyter, Berlin · New York, 2002. [1970Cod]Codd, E. F. (1970). "A Relational Model of Data for Large Shared Data Banks", Comm. ACM, Jun. [1971CEF]Codd, E. F. (1971). "NORMALIZED DATA BASE STRUCTURE: A BRIEF TUTORIAL" (IBM Research Report RJ935 1971), Proc. ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control, ACM New York, NY, USA. [1971Cod]Codd, E. F. (1971). "Further Normalization of the Data Base Relational Model" (IBM Research Report RJ909 1971), Randall J. Rustin ed., Data Base Systems: Courant Computer Science Symposium 6th, Prentice-Hall, NJ, 1972. [1972Cod]Codd, E. F. (1972). "A data base sublanguage founded on the relational calculus", Proc. ACM SIGFIDET (now SIGMOD), ACM New York, NY, USA. [1974Cod]Codd, E. F. (1974), “Recent Investigations into Relational Data Base Systems” in Proc. IFIP Congress, NorthHolland, ed. [1979Cod]Codd, E. F. (1979). "Extending the Database Relational Model to capture more meaning" [aka RM/T], ACM Trans. on D. S. [1985Cod]Codd, E. F. (1985). "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?", Computerworld, October 14 & 21. [1990Cod]Codd, E. F. (1990). The Relational Model for Database Management - Version 2. Addison-Wesley, Reading, Massachusetts & others. 2016-01-2412:51:18 Page 63 of 76

[31] [2015C&D]Codd & Date, Ltd. (2015). 1210>Organization of Data>Normal Form Definitions. (avail. in the net). [32] [1990Cor]Cornes, R. (1990). Business Systems Design and Development, Prentice Hall. [33] [2003Dar]Darwen & others. (2003). Temporal data and the Relational Model, Morgan-Kaufmann. [34] [1986Dat]Date, C. J. (1986). Selected Writings, Addison Wesley Longman ed., Boston. [35] [1998Dat]Date, C. J. (1998). Relational Database Writings 1994-1997, Addison-Wesley Longman ed., Boston. [36] [2000Dat]Date, C. J. (2000). An Introduction to DATABASE Systems (25th Anniversary), Addison-Wesley Longman ed., Reading MA. [37] [1974Dea]Deaño, A. (1974 (1st ed.)). Introducción a la lógica formal, Alianza ed., Madrid, 5th reprint 1986. [38] [1901Ded]Dedekind, R. (1901), Essays on the Theory of Numbers, Open Court ed, Project Gutenberg’s (distribution web) [2015WMW]. [39] [1973D&C]Delobel & Casey. (1973). "Decomposition of a Data Base and the Theory of Boolean Switching Functions", IBM Journal Research & Development, Sept. [40] [1976Fag]Fagan, M. E. (1976). "Design and code inspections to reduce errors in program development", IBM Systems Journal, Vol. 15, No. 3. [41] [1977Fag]Fagin, R. (1977). "Multivalued Dependencies and a New Normal Form for Relational Databases", ACM Trans. on Database Systems, Vol. 2, No. 3. [42] [1999Fow]Fowler, M. (1999). Refactoring: Improving the Design of Existing Code, Addison-Wesley, New Jersey USA. [43] [1971Ear]Earley, J. (1971). "On the Semantics of Data Structures", Randall J. Rustin ed., Data Base Systems: Courant Computer Science Symposium 6th, Prentice-Hall, NJ, 1972. [44] [1987Fre]Freytag, J. C. (1987), "A Rule-Based View of Query Optimization", Proc. SIGMOD Int. Conf., ACM ed., New York. [45] [1993G&G]Gilb &Graham. (1993). Software Inspection, Addison-Wesley. [46] [1991Grh] Graham, I. (1991). Object Oriented Methods, Addison-Wesley Longman, Inc. [47] [1981Gra]Gray, J. (1981). "The Transaction Concept: Virtues and Limitations", printed by Tandem Computers Inc. Cupertino CA (available in the net). Appeared in Proc. of 7th Int. Conference on VLDB. [48] [1991G&S]Gray & Siewiorek. (1991). "High Availability Computer Systems", Paper for IEEE Computer Magazine (draft). (research Microsoft page) [49] [1993Gra]Gray, J. (editor) (1993). The Benchmark Handbook, Morgan Kaufmann, San Mateo CA. [50] [1996Gra]Gray& others. (1996). "DATA CUBE", Proc. 12th IEEE International Conf. on Data Engineering (New Orleans, US-LA). [51] [1991Han]Hansen, M. D. (1991). “Zero defect data”, Massachusetts Institute of Technology, Massachusetts. (available online) [52] [1965Har]Harary& others. (1965). Structural Models. John Wiley & Sons, Inc., New York & others. [53] [1999EVM]Villar, E. (1999). Natural Optimizer Guide, Harmonia Software, SA (editor), Madrid (preprinted). [54] [1971Hea]Heath, I. J. (1971). "Unacceptable file operations in a relational data base", Proc. ACM SIGFIDET (now SIGMOD), ACM New York, NY, USA. [55] [1820Heg]Hegel, G. W. F. (1820 (1st ed.)). Grundlinien der Philosophie des Rechts (Vorrede) [online] in Werke. Band 7, Frankfurt a. M. 1979, S. 11-29. Cited and translated in Colletti, L. (1972). "From Hegel to Marcuse", (New York: Monthly Review Press; copy online by Ralph Dumain). E. Villar THIRD NORMAL FORM FUNDAMENTALS

[56] [1974Hei]van Heijenoort, J. (1974). "Subject and Predicate in Western Logic", JSTOR, Philosophy East and West, Vol. 24, 3. [57] [1965H&L]Hughes & Londey. (1965).The Elements of Formal Logic, Methuen ed., London. [58] [2004Hui]Huishi, J. Li. (2004). "An Introduction to Commutative Algebra", World Scientific Publishing. [59] [1970IBM]IBM. (1970). "Data processor, Issues 13-17", Data Processing Division, ed. (Wikipedia "RAS" page) [60] [1998IBM]IBM, (1998). Data Modeling Techniques for Data Warehousing, SG24-2238, Redbooks. [61] [2008IBM]IBM. (2008). z/OS SQL Reference Version 7, (SC26-9944-07), 8th ed., Softcopy Only. [62] [2008IBA]IBM. (2008). OS Administration Guide —DB2 for OS/390 and z/OS—, (SC26-9931-07), 8th ed., Softcopy Only. Building the Data [63] [1996Inm]Inmon, W. H. (1996), Warehouse, John Wiley & Sons, Inc. [64] [2003Jan]Janas, J. M. (2003). "An enhanced a priori algorithm for mining multidimensional association rules", Proc. 25th Inter. Conf. IT Interfaces. [65] [1984J&K]Jarke & Koch. (1984). "Query optimization in database systems". ACM Computing Surveys. [66] [1983Ken]Kent, W. (1983). "A simple guide to five normal forms in relational database theory" in Comm. ACM, 1983. [67] [1999Kra]Kracht, M. (1999). "Agreement Morphology, Argument Structure and Syntax", Available in the net as .pdf in 2002. [68] [1921Kur]Kuratowski, C. (1921). “Sur la notion de l’ordre dans la theorie des ensambles”, Fundamenta Mathematicae 2. [69] [1961Kur]Kuratowski, K. (1961). INTRODUCTION TO SET THEORY AND TOPOLOGY, Pergamon Press, Oxford UK. [70] [1969Lab]Labrousse & others. (1969). Las estructuras y los hombres, Ariel, Barcelona. [71] [1995Lee]Lee, B. S. (1995). "Normalization in OODB Design", ACM SIGMOD Record, Vol. 24, No. 3. [72] [2014Lep]Lepschy, G. (2014). History of Linguistic, Vol II, Routledge, New York. [73] [1834Lob]Lobachevsky, N. I. (1834), cited by Medvedev, F. A. (1991). Scenes from the History of Real Functions. Birkhäuser ed. Verlag Basel. [74] [2016Luc]Lucidchart. (2016). "Free ER Diagram Tool". (available in //www.lucidchart.com) [75] [1920Luk]Łukasiewicz, J. (1920 (1st ed. in Polish)). "On threevalued logic", Selected works by Jan Łukasiewicz, North– Holland, Amsterdam, 1970. (Ref. from Wikipedia-"Łukasiewicz logic", 2016) [76] [2000Mad]Maddux, R. D. (2000). "Axioms and rules of relevance logic", Slides at homepage for a talk at Vanderbilt University, June. [77] [2006Mad]Maddux, R. D. (2006). Relation algebras, (Studies in Logic and the Foundations of Mathematics Vol. 150), Elsevier ed., Amsterdam, The Netherlands. [78] [2009Mad]Maddux, R. D. (2009). "Relevance logic and the calculus of relations" (February 8, 2009). (Personal Homepage of R. D. Maddux - Iowa State University) [79] [1983Mai]Maier, D. (1983). The theory of Relational Databases, Computer Science Press, Rockville MD. [80] [1980M&Z]Melkanoff & Zaniolo, (1980). "Decomposition of relations and Synthesis of Entity-relationships diagrams", International Conference on ERA, North-Holland Pub. [81] [1988Mer]Merton, R. K. (1988). "The Matthew Effect in Science", (PDF), ISIS 79. [82] [1995MSo]Microsoft. (1995). INSIDE ODBC, Microsoft Press, Redmond WA. [83] [2011Mic]MicroStrategy.(2011). "MicroStrategy Reporting Suite", printed brochure.

2016-01-2412:51:18 Page 64 of 76

[84] [2000Mis]Miszczyk & others. (2000). IBM DB2 UDB for AS/400: Object Relational Support (SG24-5409-00), International Technical Support Organization. (Redbooks) [85] [1987Miu] Miura & others. (1987). "On the irreducible Non First Normal Form Relations", Information Systems, Volume 12, Issue 3. [86] [2002M&L]Mosterin & Torreti, (2002). Diccionario de Lógica y Filosofía de la Ciencia, Alianza ed., Madrid. [87] [1954Nes](available in www.todocolleccion.net, 2016) [88] [1985Mix]Mix Software. (1985). "C/Database Toolchest Tutorial" (3.0.2B), Richardson TX USA. [89] [1959Nic]Nicholas, R. (1959). "The Distinction between Predicate Intension and Extension", Revue Philosophique de Louvain, 3th series, vol. 57, no. 56. [90] [2015Ora]Oracle. (2015). "Query Optimizer Concepts". (available in the net) [91] [1975Pal]Palmer, I. M. (1975). Data Base System: A Practical reference, QED Information Sciences, Inc. MA-USA. [92] [1873Pei]Peirce, C. S. (1873). "Description of a Notation for the Logic of Relatives", Memoirs of the American Academy of Arts and Sciences, New Series, Vol. 9. (JSTOR .pdf). [93] [1881Pei]Peirce, C. S. (1881). Writings of Charles S. Peirce (A Chronological Edition - Volume 4 1879–1884), Compiled by the editors of the "Peirce Edition Project", Indiana University Press, 1989. [94] [1883Pei]Peirce, C. S. (1883). "Note B: The Logic of relatives", in Studies in Logic by Members of the Jhons Hopkins University, edited by C. S. Peirce, Little Brown & co., Boston. Reprinted by Jhon Benjamin Pub., Amsterdam & Philadelphia, 1983. (This citation is a digest of the entry (195.) of [2006Mad] Bibliography) [95] [1999Poo]Poolet, M. A. (1999). "Why you need database normalization", ("SQL by Design" internet page). [96] [2008Prd]Pradhan & others. (2008). "A Unified Relational Semantic Representation", OntoNotes project, University of Pennsylvania. (Available in the net). [97] [1992Pra]Pratt, V. R. (1992). "Origins of the Calculus of Binary Relations", Proc. IEEE Symp. on Logic in Computer Science, Santa Cruz, CA. [98] [1953Qui]Quine, W. V. (1953 (1st ed.)). From a logical point of view, Harper & Row, 1963. [99] [1954Qui]Quine, W. V. (1954). "Reduction to a Dyadic Predicate", Journal of Symbolic Logic, Vol. 19. [100] [1981Qui]Quine, W. V. (1981). Mathematical LOGIC, Harvard University Press, Cambridge & others, (Revised edition). [101] [1987Qui]Quine, W. V. (1987). Quiddities: An Intermittently Philosophical Dictionary. Cambridge, MA: Belknap of Harvard UP. [102] [1977Ris]Rissanem, J. (1977)."Independent Components of Relations", ACM Trans. on Database Systems, Vol. 2, No. 4, Dec. [103] [1970Rob]Robinson, J. J. (1970). “Dependency Structures and Transformations Rules”, Language 40, JSTOR Vol. 46. [104] [1903Rus]Russell, B. (1903 (1st ed.)). Principles of Mathematics, W.W. Norton ed., (1938 ed.) reprinted, (ISBN 0-393-00249-7). [105] [1910R&W]Russell &Whitehead. (1910 (1st ed.)). Principia Mathematica to *56, Vol. I, Cambridge University Press, London, etc. 1962. [106] [1992Sas]Sasha, D. E. (1992). Database tuning: a principled approach, Prentice-Hall, Inc., Upper Saddle River, NJ, USA. [107] [1963Sch]Schilpp, P. A. (1963). "The philosophy of Rudolf Carnap", chapter 18 of (Goodman, N., The significance of DER LOGISCHE AUFBAU DER WELT, Open Court, ed.). E. Villar THIRD NORMAL FORM FUNDAMENTALS

[108]

[1976Sha]Sharman, G. C. H. (1976). "A Constructive Definition of Third Normal Form", Proc. 1976 ACM SIGMOD Conference on Management of Data, Washington, D.C. [109] [1987S&K]Siebes & Kersten. (1987). "Design Axioms and Topology to Model Database Semantics", in Proc. 13th VLDB Conference (Brighton). [110] [1999Sim]Simpson, S. P. (1999). NATURAL Essentials ―Version 2.10―, (a self-study programming course), 19992000. (Available in the net) [111] [1983S&B]Schmidt & Brodie, (1983). Relational database systems: Analysis and comparison. SpringerVerlag. [112] [1985Smi]Smith, H. C. (1985). "Database Design Composing Fully Normalized Tables from a Rigorous Dependency Diagram", Comm. ACM, Vol. 28. [113] [1989SAG]Software AG, (1989). ADABAS System Architecture, (training course, formerly "ADABAS Internals"). [114] [1991Sta]Stalmarck, G. (1991). “Normalization Theorems for Full First Order Classical Natural Deduction”, Journal of Symbolic Logic, Vol. 56. [115] [1990Sto]Stonebraker & others. (1990). "ThirdGeneration Database System Manifesto", Newsletter ACM SIGMOD Record, Vol. 19, ACM New York (NY, USA). (1997). "Ultra™ [116] [1997SUN]SUN Microsystems. Enterprise™ 10000 Server: SunTrust™ Reliability, Availability, and Serviceability", Technical White Paper. [117] [2007Vaa]Väänänen, J. (2007). Dependence Logic, Cambridge University Press, New York. [118] [2015Van]Vandekerckhove & others. (2015). "Model Comparison and the Principle of Parsimony", University of California, Irvine. (Internet page) [119] [1985Wir]Wirth, N. (1985). Algorithms and Data Structures, Oberon version: August 2004. (Available in the net) [120][1999W&A]Widenius & Axmark. (1999). "MySQL Introduction", Linux Journal Nov. [121][2011WTR]Wikipedia. (2011). "Transitive reduction". [122][2014WDS]Wikipedia. (2014). "Disjoint sets". (2015). "Booolean algebra [123][2015WBA]Wikipedia. (structure)". [124][2015WCL] Wikipedia. (2015). "Cladogram". [125][2015WDI]Wikipedia. (2015). "Data integrity". [126][2015WDS]Wikipedia. (2015). "Data security". [127][1921Wit]Wittgenstein, L. (1921). Tractatus LogicoPhilosophicus, Routledge, London, 1981. [128][2015WMW]Wolfram MathWorld. (2015). "Boolean algebra" (internet page). [129][1956vNe]von Neumann, J. (1956), "Probabilistic Logics and the Synthesis of Reliable Organisms From Unreliable Components", Automata Studies, Princeton University Press. [130][1979Y&C]Yourdon & Constantine. (1979). Structured Design, Yourdon Press Computing Series, Prentice Hall, NJ. [131][1986Zac]Zachman, J. A. (1986). "A framework for information systems architecture", IBM Systems Journal, Vol. 26. [132][1982Zan]Zaniolo, C. (1982). "A New Normal Form for the Design of Relational Database Schemata", ACM Trans. Database System.

2016-01-2412:51:18 Page 65 of 76

32.2.2 Row cardinality & derived relation

32. APPENDICES Each section is a different appendix.

32.1 Data modelling We recommend the book of Batini, Ceri & Navathe, Conceptual database design —An entity relationship approach— [1992BCN], because they integrate "normality" as a data design quality: "We believe that normality is an important property of a schema" [1992BCN].

RM/T term Tuple Set of relation tuples; R*; R Extension; tabulation Cardinality; extension size Derived relation

SQL term Row Set of table rows; relvar [in FROM clause]

ERA term Case Set of entity cases

Number of rows

Number of cases

View

Derived entity

32.2.3 Domain RM/T term Domain Built-in domain Only values in the attribute

SQL term User data type Data type {IS NOT NULL} column

ERA term Business domain Domain Only values in the property

SQL term PRIMARY KEY CANDIDATE KEY ---FOREIGN KEY FOREIGN KEY

ERA term Identifier n/a n/a ---M:1 relationship 1:1 relationship

32.2.4 Keys RM/T term Primary key Candidate key Primary domain Nonkey domain Minor-key domain ElementOf Denotative FK

32.2.5 Designer entities RM/T term As already stated, based on their experience and intuition they also point out the following thesis: "We stress that the ER model and [our] design methods tend naturally to produce normalized schemas" [1992BCN]. "The objective of normalization is to keep each functional dependency separated, by associating to each set of homogeneous FDs an element of the model (entity or relationship) that has the determinants of the FD as identifiers. Thus, each concept of the application domain is mapped to exactly one concept of the schema" [1992BCN]. "In our approach, normalization is a tool for validating the quality of the schema rather than a method for designing the schema" [1992BCN].

32.2 Synopsys of database terms «For better or worse, SQL is intergalactic data speak» Stonebraker, Gray, Bernstein, Rowe, Lindsay, Carey, Brodie and Beech (1990) [1990Sto]. This appendix shows a synopsis of database terms of RM/T model, SQL and ERA model inspired in Codd [1990Cod]. This section has the following entries: o Entity type; o Domain; o Keys; o Designer entities; and o Complex designer entities.

32.2.1 Entity type RM/T term Base relation Attribute Number of attributes; Degree; n-arity

SQL term Table Column Number of columns

ERA term Entity [type] Property Number of properties

E. Villar THIRD NORMAL FORM FUNDAMENTALS

Characteristic relation

SQL term Referenced-only table Characteristic table

Associative relation

Associative table

Denotative relation

Denotative table

Denoted relation

Denoted table

Kernel relation

ERA term Entity Weak entity M:N relationship 1:1 relationship An entity + 1:1 relationship

32.2.6 Complex designer entities RM/T term

SQL term

Sub-relation

Sub-table

Tree graph

Tree schema

Digraph relation

BOM table

Aggregate relation (M rows to 1 row)

Aggregate table

ERA term Sub-entity / IsA / Specialization M:1 reflexive relationship M:M reflexive relationship Aggregate entity

32.3 PM identity 'Identity' is the Logic label for defining the values that any PM formula can take. "First order predicate calculus" [1970Cod] (abbr. FOP calculus) is part of PM System of Russell [1965H&L]. The 'identity' label appears in a known "Leibnitz's postulate ―The identity of indiscernibles" [Wikipedia (2015)].

32.3.1 Well-formed formula Each well-formed formula (abbr. wff) of PM system only can be true (abbr. T) or false (abbr. F). Otherwise, the evaluated statement is not a wff.

32.3.2

PM Identity definition

The format of PM Identity exposition follows the original presentation of MAYBE Logic of Codd [1979Cod]. Obviously, the content does not.

2016-01-2412:51:18 Page 66 of 76

Taking advantage of the abbreviations in previous subsection, a formal definition of PM Identity follows. PM Identity ≝{NOT(F)=T; NOT(T)=F} Definition 9: PM Logic Identity Logical identity matches the Scholastic law of "EXCLUDED MIDDLE" (Wiktionary). We close the PM Identity with the TRUE tables of AND, OR, IMPLIES and NOR predicate connectors [1965H&L].

32.4 Glossary of predicate terms “Meaning” column come from Unicode names of MS/Word "Cambria Math" Symbol table. We use the arrow symbols exclusively for DETERMINES binary relation (xRy), i.e. x→y [1971Cod]. and its corresponding family (according Maddux [2000Mad]).

32.4.1

Predicates Predicate p=q p≠q ≝ ¬p p|q p&q IF p THEN q p≡q p↑q p↓q

Definition p Equal To q p Not Equal To q Equal To By Definition NOT (p) p OR q p AND q p IMPLIES q p EQUIVALENT TO q p NAND q p NOR q

The predicate notations are that of Hughes & Londey [1965H&L] with the exception of IMPLIES notation. IMPLIES notation follows the common "IF p THEN q" of Functional Dependency field. "≝" is a metapredicate.

32.4.2 Predicate Quantifiers Quantifier ∀ ∃

Meaning For all There exists There does not ∄ exist There is exactly ∃! one "∃!" comes from Plato (132BC), Peano (1886) and Wie (1893) (Sources: Wikipedia "Uniqueness existential" page and its external links).

32.5.5 TRUE table of MINUS − F T T T F F T T The meaning of A MINUS B is the following: {A MINUS B} = {A − B} = {A AND ~B}. 32.6 Logic of Classes It exists a finite universe of discourse called U. The classes are sets whose extension is indeterminate but finite.

32.6.1 ElementOf and singular classes "The vocabulary of set theory consists of just one binary predicate symbol ∈" [2007Vaa]. The ElementOf predicate symbol (∈) is able to connect an individual instance of database relation R with another instance of relation S in the basis of having the same value [1970Cod] and that predicate formula of relation R includes the name of relation S. The relation R is called the 'referencing' relation, and S, the 'referenced' relation. Element & Class U ∅ a∊A b∉A

Definition Universe [of atoms] The Empty Set a Element Of A b Not An Element Of A Complement Of S, i.e. ~S ~S ≝ ∃U & U⊋A & x∊U & x∉A. Cardinality of A, i.e. number of atoms of A |A| The 'Element Of' predicate symbol is able to connect a determinant attribute of a dependency structure of relation R with another tobe-detached substructure S in the basis of saving the name of detaching substructure as the home set of determinant attribute. The determinant attribute of relation R is called a 'connecting' attribute, and S, the 'home set' of determinant attribute.

32.6.2 Inclusion predicates Class vs. Class Aˢ⊃A Aˢ⊃A Aˢ⊇A Aˢ⊋A A′⊂A A′⊆A

32.5 TRUE tables

A≠B

The TRUE tables of AND, OR, IMPLIES, NOR and MINUS follows.

32.5.1 TRUE table of AND & F T

F F F

T F T

F F T

T T T F T F

Boolean Classes A∪B A∩B T T T

32.5.4 TRUE table of NOR ↓ F T

F T F

Given U, A, B non empty sets; A⊂U, B⊂U; A≠B holds, if [|A|≠|B| or ¬(A∪B=A)] holds. The first condition is a particular case of the second but it is a classical human optimization of this check.

32.6.3 Boolean classes

32.5.3 TRUE table of IMPLIES IMPLIES F T

Aˢ Superset Of A Aˢ Superset Of A Aˢ Superset Of OR Equal To A Aˢ Superset Of AND Not Equal To A A′ Subset Of A A′ Subset Of OR Equal To A A Not Equal To B;

A⊍B A DISJOINT B [≝ {A∩B = ∅} Caveat: Hughes & Londey's PM System [1965H&L] uses the inclusion symbol (p⊃q) as 'p IMPLIES q' predicate, i.e. our 'IF p THEN q' predicate.

32.5.2 TRUE table of OR | F T

Definition

T F F

E. Villar THIRD NORMAL FORM FUNDAMENTALS

32.6.4

Definition A UNION B A INTERSECTION B

Set predicates Predicate A−B; A÷B; A\B A\B IF A THEN B

Meaning A MINUS B B MINUS A {NOT(A) OR B} is true

2016-01-2412:51:18 Page 67 of 76

32.6.5 Complex classes Complex Classes A×B S² ℙ(A) U²

℘(S)

Ω Ω²

Definition Cartesian Product (a set of couples) Cartesian Power (S×S) Powerset of A Universe of 〈x, y〉 atoms Powerset Of S (set of all subsets of S); The atoms of a powerset are subsets; each subset is an element of the powerset. Universe whose atoms are disjoint subsets of R attributes. Universe of 〈x, y〉 pairs whose relatives are elements of Ω.

32.7 Formulas As Sets "Formulas as sets" for defining a Boolean algebra on sets comes from R. Maddux [2000Mad]. Given an indeterminate but finite set U: {p, q, r,…} of atoms x∊U, and a family of non-empty subsets S : {A≠∅, B≠∅, C≠∅,…}, S⊆U, set operations are used to define a Boolean algebra on sets: A|B ≝ A∪B = {x: x∊A OR x∊B} A&B ≝ A∩B = {x: x∊A AND x∊B}

– –

( and ) formula is called a relative product; Given and , some pairs of R2 being implied by premises holds (saving some observation);

32.8.2 Functional Dependency is a binary relation xFDy is read "x determines y"; x and y are elements of same set which is called R; The ordered pairs x→y are elements of R2; All 〈x, y〉 pairs of R2 can be observed in R* using OBSERVE method: (x→y) or ∼(x→y) observations are possible; – x referent is called the determinant [1971Hea]; y relatum is called the dependent [1971Cod]; – The (x→y and y→z) product gives (x→z) (transitivity); Remark: Both x and y can be a subset of attributes of R, i.e. {x} and {y} can be subset of the powerset of R which is called Ω;

– – – –

32.8.3 Meaning of the set A MINUS B The meaning of A MINUS B is the following: {A MINUS B} = {A − B} = {A AND ~B} = {A∩~B}. The explanation comes with a subset of attributes of R being part of some functional dependency formula. Let x be a subset of R attributes, then "∀y∊x: {x MINUS y}↛y & y↛{x − y}" means the subset "x MINUS attribute y does not determines y and vice versa". Now, let x: {P, L, C} and y: P; the case is: {L, C}↛{P} & {P}↛{L, C}. The remainder cases, i.e. "y: L" and "y: C", are left to the reader.

32.7.1 Empty Set and Complement

32.8.4 Meaning of A NOR B

Empty Set and Complement (Set Negation) can be defined, too:

The meaning of 'x NOR y' can be easily understood with the following steps, 1) Expand xNORy ≝ NOT(x OR y); 2) Translate to sets NOT(x OR y) = ~(A∪B) 3) A↓B ≝ ~(A∪B); 4) "A↓B = ~A∩~B" is the NOR formula for sets 5) Now, we switch to set of ordered pairs, e.g. {x→y}, in the following way: x↓y = {〈x, y〉: ∀z(〈z, x〉∉A AND 〈z, y〉∉B)}; taking into account that each point is a maximally independent subset of attributes as (A·B·C) and 〈x, y〉, e.g. x→y, performs on two different attributes of R (or on two disjoint subsets of the power set of R). Example: {IF x→y THEN (y→x NOR y↛x)}; whose meaning is "If x→y in R, this does not say whether or not y→x in R" [2007E&N].

∅ ≝ {x: x≠x} ~A ≝ {x: x∊U AND x∉A}

32.7.2 Boolean Algebra Rules Please, reminder the following meanings of A and B: • A|B ≝ A∪B = {x∊A OR x∊B}; •

A⋀B ≝ A∩B = {x∊A AND x∊B}; Idempotent Commutative Associative Absorption Distributive

Complement Neutral Element

A∪B A∪A = A A∪B = B∪A (A∪B)∪C = A∪(B∪C) = A∪B∪C A∪(A∩B) = A A∩(B∪C) = (A∩B)∪(A∩C) A∪~A = U A∪∅ = A

A∩B A∩A = A A∩B = B∩A (A∩B)∩C = A∩(B∩C) = A∩B∩C A∩(A∪B) =A A∪(B∩C) = (A∪B)∩(A∪C) A∩~A = ∅ A∩U = A

32.8 Formulas As Relations Formulas As Relations comes from R. Maddux [2000Mad] and it is an strong Logical device for defining a Boolean algebra on a set of binary relations (as the Functional Dependency family 1.2 is).

32.8.1 – –



– – – –

'binary relation' IsSynonymOf 'xRy'

xRy is a Logic formula; xRy means "x subject bears [the binary relation R] to y subject" [1910R&W], for instance: is-greater-than (x>y), superset-of (x⊃y), loves (x♥y), pecks (xPy), etc.; xRy is a Logic two-places predicate, i.e. Pxy, describing an existing relationship; the standard Pxy reading would be "x is the object of love of y subject" [1965H&L]; besides, Pxy atoms can be members of different sets; x and y are atoms of individuals of same set which is called U; The ordered pairs is an element of U2; x is called the referent subject; y is called the relatum subject [1910R&W]; ‘Relative’ is the Peircean adjective for logic connectives, set operations and subjects of binary relations:

E. Villar THIRD NORMAL FORM FUNDAMENTALS

32.9 Database MAYBE Logic Database MAYBE Logic [1979Codd] is the RM label for the standard Three-valued Logic [1920Luk]. This appendix is a sucint description of the environment in which UNMARKED normal form performs. Database MAYBE Logic has the fillowing subsections: o MAYBE identity; o TRUE table of maybe-OR; o TRUE table of maybe-AND; o On three-valued logic; o On MAYBE logic for the r-model; and o MAYBE query logic.

32.9.1 MAYBE identity RM/T system includes a set of database operators with the features of the standard Three-valued Logic, i.e. T, F and ω are a possible result of every evaluated query. ω symbol means 'unknown'. The MAYBE Identity of RM/T follows. Logical identity refers to the logical value of a predicate like "NAME = 'Smith';". The evaluation of each predicate can be true (abbr. T), false (abbr. F) or unknown (abbr. ω). A formal defintion of MAYBE Identitiy follows. MAYBE Identity ≝ {NOT(F)=T; NOT(ω)=ω; NOT(T)=F}; Any mix of "IS NULL" and "NOT NULL" columns in some predicate (of WHERE clause of a query), do trigger the MAYBE Logic. 2016-01-2412:51:18 Page 68 of 76

Each evaluation step has three possibilities that illustrates the MAYBE concept for AND connector: AND F ω T F F F F F ω ω ω F ω T T

32.9.4 On three-valued logic Imagine a continuous of probabilities where F (0) and the T (1) during a predicate evaluation, the "Three-valued Logic" [1920Luk] is just an instance of "Multivalued Logics" [1974Dea]. «Three-valued Logic plays with three predicate stati: F (0.0 probability of true), T (1.0 probability of true), and ½ (0.5 probability of true)" which is the new predicate state» [1974Dea]. Codd —in the tables below— uses the symbol ω in the intermediate predicate evaluation; (ω) —whose reading is 'unknown'— has the same meaning of (½) predicate state of Łukasiewicz. «The TRUE tables of Bolean AND and OR operators (below) map the new Łukasiewicz’s thoughts but they are following classic criteria of true» [1974Dea]. «Let consider that the TRUE value is the most distinguished, the “best”. If one of the sentences is true and the other is false, always the conjunction of the two is false, roughly: In two-valued logic, joint always take the worse value of its arguments. The same occurs in the three-valued logic, the conjuction prefers the worse part: If one of its members is true and the other is undetermined, the joint sentence will be undetermined; if one of them is undetermined and the other is false, the conjunction will be false. In the disjunction occurs the opposite, the meet always takes the better part and so is in three-valued logic. Between true and indetermined, the true value; between false and undetermined, the undetermined value» [1974Dea].

32.9.5 On MAYBE logic for the r-model «We shall concern ourselves with only the "value at present unknown" type of null and denote it by ω» [1979Cod]. «The first question which arises is: what is the truth value of x = y if x or y or both are null? An appropriate result in each of these cases is the unknown truth value, rather than true or false. Accordingly, we adopt a three-valued logic for use in extracting data from databases that may contain null values. We use the same symbol "ω" to denote the unknown truth value, because truth values can be stored in databases and we want the treatment of all unknown values to be uniform» [1979Cod].

32.9.6 MAYBE query logic The query manager reads each row and having the values of mentioned columns, e.g. DriverLic, DateOfBorn, etc., proceeds the evaluation of each predicate in ternary fashion: B'00' means F, B'01' means ω, B'10' means T. The predicate evaluation ends with n values forming an enumeration of n ternary stati. The evaluation of the predicate expression, i.e. if this particular row is part of the target set or not, newly ends with one of {F, ω or T} value. The row is rejected only if this last evaluation is F. Otherwise (ω or T), the row is accepted as part of the target set of rows.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

EXTERNAL LAYER

32.9.3 TRUE table of maybe-AND

32.10 The three levels of R architecture A relational database according to the r-Model of E.F. Codd & C. Date instatiates as a lot of r-DBMS software products has a Software Development Architecture with three levels. ERA model always was a Data model instantiating in a strong Data modeling supporting the Logical Design phase of Functional Analisys and currently essential in the Software factory concept. ERA Model Relational Model (RM/T) (RM/T) The SQL view (formaly, Derived table) is the sole external object. The SQL tables and columns have names but their rows are virtual, i.e. they are stored in true tables. It is the programmer's layer [N/A] (internet, OLTP and batch process) using native SQL embedded in any programming language as. It is the end-user layer, e.g. Data warehouse Analists using OLAP native tools that generate SQL (e.g. MicroStrategy). Entity and Base table (or Base relation) is Relationship are the the sole external object; its design units. columns are catalogued in the relational dictionary and each rows occupy a physical place in the disk. Data model Relational schema Entity Tabla Base Attribute deprecated by Property Attribute (or Column) Identifier or Subject Primary key 1:1 relationship Denotative table (3NF.C) 1:M relationship Characteristic table (3NF.B) N:M relationship Associative table (3NF.A) IsA binary relation Subtable (3NF.C; includes IsA) Dimensional entity Dimensional table (3NF.E) is part of a Star schema. Physical objects Performance-oriented internal objects: ⇨ Indexes; ⇨ Data Clustering; [N/A] ⇨ Table partitions; ⇨ View materializations; ⇨ Space compression; ⇨ Block compression; ⇨ Memory size for data; ⇨ Page size. Customize a hierarchy of different sizes of discs of a Hard Disk Storage Server for reaching the maximum parallelism of the i/o's from physical mainframe channels in case of database Upper index [1989SAG] of Inverted lists. Capa BASE

Each evaluation step has three possibilities that illustrates the MAYBE concept for OR connector: OR F ω T F ω T F ω ω ω T T T T T

PHYSICAL LAYER

32.9.2 TRUE table of maybe-OR

2016-01-2412:51:18 Page 69 of 76

32.10.1 Conceptual, Logical and Physical designs

32.11 R-DBMS performance oriented duties

The articulation of the three phases is to complete the conceptual design before committing the logical, and finish it before undertaking an initial physical design. The protocol for this last deliverable is that Physical Design documentation should be a true reflection of what delivered and accepted by the Production after the stress test.

r-DBMS performance is part of System programmer and DBA duties. This appendix isjust a quick reminder of how tuning a whole system Hardware-Software based on the particular experience of the author along too many happy years. (1) Reaching a steady state of the system 95% CPU (probably +80% for main DBMS engine) at peak time during rush hours. (2) Balancing for similar pagination rates in all main DBMS buffers. (2a) Balancing pagination is a matter of relative buffer sizes; (2b) Current minimal size buffers are out of this management; (2c) Materializations must paginate at the same compass that the other buffers; (2d) Physical I/O average duration is 99% of user response time; (2e) Each DBMS program must have a known median response time in order to physical I/O monitoring and resources attribution; (2f) In Operating System having user batch processes in different memory address spaces of DBMS, the DBA can teach how deviate printing reports from TP environment to some of these batch OS processes increasing the performance of multiCPU computers, printing the report in a local printer and return the report to his owner via usual paper distribution policy; (2g) "Paper distribution policy" of the company can be "Zero printed reports" attaching the report to an in-house e-mail; (2h) Suspend president exclusive tube and high priorities during rush hours. (3) Database indexes management according the protagonist modules of peak time. (4) Query materializations monitoring every day for new applications until arriving to the steady state in rush hours. (4a) Query materializations statistics should be followed in order to killing the non-reused ones. (5) SQL queries and program refinement proposals according the worst MTBF; (5a) Re-coding from scratch the current worst MTBF program or function is an option of compromised programmers, taking into account the disproportion of cost of rewriting code vs. initial writing code [1979Y&C].

32.10.2 Desiderata on deliverables diagramming The state of the art in software project documentation appears sufficient but it is unbalanced in favor of the first two phases: • There is a lot of diagrams for the data model and the relational schema (blue zone) [2016Luc]; • The views and the aggregated entities as the central table of an star DW schema merits having standard diagrams; Diagrams for the configuration of the disks, SQL spaces, the selectable types of indices, the options of data compression at disk, space, variable blocks, records and fields [1989SAG], will be very appreciate; Open the arcane of the Hard Disk Storage Server with standard settings for every r-DBMS product; accompanied by diagrams of these spectacular hardware supporting relational databases.

E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 70 of 76

33. ALPHABETIC INDEX 1.1 THE DATABASE PREDICATE 1 1.2 THE PREDICATE SUBJECT 1 1.3 SQL: INTERGALACTIC DATA SPEAK 1 1.4 SQL: WYSIWYG 1 1.5 A VERY STABLE DATA DESIGN 1 1.6 R-DBMS WALK PARK VIEW 1 1.7 DATABASE DICTIONARY 1 1.8 THIRD NORMAL FORM FLASH 1 1.9 THIRD NORMAL FORM MELODY 1 1.10 DATA COUPLING VS. COHESIVE DATA 2 1.11 THIRD NORMAL FORM CANON 2 1.12 THIRD NORMAL FORM OVERTURE 2 1.12.1 Foreign key 2 1.12.2 Foreign key is a 3NF tool 2 1.13 THIRD NORMAL FORM REFRAIN 2 1.14 THIRD NORMAL FORM NUTSHELL 2 1.15 THIRD NORMAL FORM ENVIRONMENT 2 1.15.1 Preserving database programs 2 1.16 NO ATTRIBUTE MIGRATION 2 1.17 COHESIVE SOFTWARE 2 1.18 COMMUNICATION STRUCTURE 3 2. PREDICATE SERIES 3 2.1 HUGHES & LONDEY: PM PREDICATE 3 2.1.1 Predicate property 3 2.2 RUSSELL: XRY PREDICATE 3 2.2.1 xRy represents a binary relation 3 2.3 HEIJENOORT: INCREMENTAL APPROACH 3 2.3.1 Subject and predicating roles 3 2.4 FREGE: MORE THAN ONE SUBJECT 3 2.5 CARNAP: MORE THAN ONE PROPERTY 3 2.5.1 Descriptive functor 4 2.5.2 Definite denotation 4 2.5.3 Descriptive predicate 4 2.6 MADDUX: A SET OF ORDERED PAIRS 4 2.7 CODD: A SET OF POLYADIC PREDICATES 4 2.7.1 Predicate intension 4 2.7.2 Predicate extension 4 2.7.3 Subject of a polyadic predicate 4 3.1 THE NORMAL FORM 4 3.1.1 Logical & stable data design 4 3.1.2 Column & rows vs. physical stuff 4 3.1.3 Unique identifiers 5 3.1.4 Logical ℝ definition language 5 3.1.5 Say 'what' you want not 'how' 5 3.2 DATABASE PREDICATE 5 3.2.1 PM identity of database predicate 5 3.2.2 Data integrity 5 3.3 ELEMENTOF 5 3.3.1 ElementOf is part of FD theory 5 3.3.2 ElementOf closes a semantic gap 5 3.4 PREDICATE-PREDICATE CONNECTION 5 3.1 UNDISCIPLINED PREDICATE 5 3.1.1 Convoluted predicate 5 3.1.2 Impairing the PM subject unity 5 3.1.3 Undisciplined data design 6 3.1.4 Transitive key dependence 6 3.2 DISCIPLINED DATA DESIGN 6 3.3 DISCIPLINED PREDICATES 6 3.3.1 RM/T predicates are in 3NF 6 3.3.2 Zero defect data design 2.0 6 3.4 DATABASE PREDICATE STRUCTURE 6 3.4.1 The subject 6 E. Villar THIRD NORMAL FORM FUNDAMENTALS

3.4.2 Descriptive part 6 3.4.3 Denotative part 6 3.5 THE FOUR + ONE RM/T PREDICATES 6 3.5.1 Class predicate 6 3.5.2 Kernel predicate 6 3.5.3 Characteristic predicate 6 3.5.4 Associative predicate 6 3.5.5 Denotative predicate 7 4. DATABASE RELATION 7 4.1 RELATION, RELATIONAL, RELATIVE 7 4.1.1 Relation noun 7 4.1.2 Relational adjective 7 4.1.3 Relative adjective 7 4.2 RELATION, ATTRIBUTE, DOMAIN, DATATYPE 7 4.2.1 Coordinates of a database cell 7 4.3 RELATION ON DOMAINS 7 4.3.1 Built-in domain 7 4.4 RELATION ON ATTRIBUTES 7 4.4.1 Surface and deep structures 7 4.5 RELATION ON TUPLES 8 5. DATABASE TABULATION 8 5.1 EXAMPLE OF TABULATION 8 5.1.1 Original ASSIGN credits 8 5.1.2 Tabulation of an airport panel 8 5.1.3 Restrictions on PANEL 8 5.2 TABULATION MAP 9 5.3 TABULATION BODY PROPERTIES 9 5.4 BUILT-IN DOMAIN 9 5.4.1 Prosaic datatypes 9 5.4.2 UOM of quantity column 9 5.4.3 Non-prosaic datatypes 9 5.5 VARCHAR & BLOB RM VALUE 9 5.5.1 Varchar & blob structures 9 5.6 DATATYPES FOR TABULATIONS 9 5.6.1 Missing and inapplicable marks 9 5.7 PROBING 3NF TABULATION 9 5.8 COOKING R TABULATION 10 5.9 GALLUP TABULATION 10 5.10 CATALOG OF TABULATIONS 10 5.10.1 The normal virtuous circle 10 5.11 STANDARDIZING R TABULATION 10 5.12 TABULATION ORIENTED DML 10 5.13 3NF MATHEMATICAL PROOF 10 5.13.1 The 3NF human factor 11 6. DATABASE DEPENDENCY 11 6.1 PROJECT OPERATOR 11 6.2 ACTIVE DOMAIN 11 6.2.1 Composite active domain 11 6.3 DEPENDENCY GRAPH 11 6.3.1 Dedekind's transform 11 6.3.2 The relational graph 12 6.4 FUNCTIONAL DEPENDENCY 12 6.4.1 True functional dependency 12 6.5 R & FD SPECS IN CODD NOTATION 12 6.5.1 Reminder of 1NF compliance 12 6.6 CAVEAT ON OUR FD VERSIONNING 12 6.7 FUNCTIONAL NONDEPENDENCY 12 6.8 NONDETERMINES EXTENDED DEFINITION 12 6.9 DEPENDENCY GRAPH INTERPRETATION 12 6.10 FIRST FD FORMALIZATIONS 13 6.11 DELOBEL & CASEY FD PROPERTIES 13 6.12 ARMSTRONG FD AXIOMS 13 6.13 DEPENDENCY BOOLEAN ALGEBRA 13 6.13.1 Determines (original) 13 2016-01-2412:51:18 Page 71 of 76

6.13.2 Relative UNION 13 6.13.3 Relative INTERSECTION 13 6.13.4 Determined (converse) 13 6.13.5 Relative Identity 13 6.13.6 Relative Diversity 13 6.13.7 Nondetermines (reserved) 13 6.13.8 Nondetermined 13 6.14 ℛ BOOLEAN ALGEBRA LEGEND 13 6.15 MULTIVALUED DEPENDENCY 1.2 14 6.15.1 FD & MVD integration 14 6.16 FUNCTIONAL DEPENDENCY FAMILY 1.2 CODE CARD 14 6.17 FUNCTIONAL CODEPENDENCY 14 6.18 FUNCTIONAL INDEPENDENCY 14 6.18.1 IndependentOf definition 14 6.19 FUNCTIONAL DEPENDENCY ON R SUBSETS 14 6.19.1 FD compares concatenated subset values 14 6.19.2 Providing subset were disjoint 14 6.19.3 Representation principle 14 6.19.4 Subsets of R 14 6.20 EVERY ATTRIBUTE OF Y DEPENDS ON FULL X 14 6.21 FULL SUBSET Y DEPENDS ON FULL SUBSET X 15 6.22 NONE ATTRIBUTE OF X DEPENDS ON THE FULL Y 15 6.23 BROCHURE OF FD VERSIONS 15 6.23.1 Functional Dependency 1.2 15 6.23.2 Functional dependency 2.1 15 6.23.3 Functional dependency 2.2 15 6.24 FUNCTIONAL DEPENDENCY 1.2 15 6.25 ELEMENTARY FD 15 6.26 TRIVIAL DEPENDENCY 15 6.26.1 Trivial FD + nontrivial FD = nontrivial FD 15 6.27 X→Y IS A BINARY RELATION 15 6.27.1 Specialties of FD 15 6.27.2 Observable FD is TRUE 15 6.27.3 x↛y possibility 15 6.27.4 Codd's FD specs of R 16 6.28 PROPERTIES OF XRY 16 6.28.1 xRy positive behavior 16 6.28.2 xRy negative behavior 16 6.28.3 xRy unpredictable behavior 16 6.29 RELATIVE PROPERTIES OF X→Y 16 6.29.1 Relative properties of x←y 16 6.30 RELATIVE PROPERTIES OF X↛Y 16 6.30.1 Relative properties of x↚y 16 6.31 FD SEMANTICS 16 7. MAXIMAL INDEPENDENT SUBSET ERROR! BOOKMARK NOT DEFINED. 7.1 FUNCTIONAL DEPENDENCY 2.1 16 7.1.1 Functional independency 2.1 17 7.1.2 Functional dependency 2.1 17 7.1.3 Functional converse dependency 2.1 17 7.1.4 Functional codependency (original) 17 7.1.5 Functional dependency 2.1 code card 17 7.2 EVOLUTION OF DEPENDENCY CONNECTORS 16 7.3 FD 1.1 INSTANTIATIONS 16 7.3.1 x⊋y instantiation 17 7.3.2 Formal x=x instantiation 17 7.4 MULTIATTRIBUTE 17 7.4.1 Irredundant multiattribute 18 7.4.2 Maximal independent subset (miβ) 18 7.4.3 Irredundant multiattribute of R 19 7.5 MULTIATTRIBUTE NOTATION 19 7.5.1 Unchecked multiattribute 19 7.6 MULTIATTRIBUTE MISCELLANEA 19 7.6.1 Multiattribute corollary 19 E. Villar THIRD NORMAL FORM FUNDAMENTALS

7.6.2 Multiattribute default 19 7.6.3 Black-box variable 19 7.6.4 Irredundant attribute 19 7.7 REDUNDANT MULTIATTRIBUTE 19 7.8 MULTIATTRIBUTE CONTEXT 19 7.8.1 Familiar multiattributes 19 7.8.2 Formal multiattributes 19 7.9 FROM BINARY TO POLYADIC RELATIONS 19 7.10 COMPOSING IRREDUNDANT SUBSETS 19 7.11 CONSTRUING A POLYADIC MIß 19 7.12 ASSEMBLE A TRIADIC MIß 19 7.13 DEPENDENCY STRUCTURE 19 7.13.1 Dependency nodes 20 7.13.2 Dependency connectors 20 7.13.3 Dependency brick 20 7.13.4 Condensed structure 20 7.13.5 Tree from One Source digraph 20 7.13.6 FD is a Codd RM research 20 7.14 CONDENSED DEPENDENCY STRUCTURE 20 7.15 FUNCTIONAL DEPENDENCY 2.2 SPECS 20 7.16 DEPENDENCY STRUCTURE OF PANEL 20 7.16.1 Primitive dependency specs 20 7.16.2 Determinator/PK selection 20 7.16.3 Determinant/PK selection 20 7.16.4 Representable FD structure 20 8.1 BASE RELATION 20 8.2 DERIVED RELATION (SQL VIEW) 21 8.3 CANDIDATE KEY 21 8.3.1 One CK always exists 21 8.3.2 Each attribute depends on each CK 21 8.3.3 Structure of a multiattribute CK 21 8.4 MINIMAL SUBSET OF R 21 8.4.1 CK minimality is an steady state 21 8.4.2 Nonkey minimality is an steady state 21 8.4.3 Any attribute is minimal 21 8.5 PRIMARY KEY 21 8.6 ALTERNATE KEY 21 8.6.1 Alternate key definition 21 8.6.2 AK component IS NOT NULL WITH DEFAULT 21 8.6.3 AK component IS NULL 21 8.7 PRIMARY KEY PROPERTIES (1) 21 8.7.1 PK existence 21 8.7.2 PK uniqueness 21 8.8 AXIOM OF MINIMAL PK 21 8.8.1 Postulate of minimal CK 22 8.8.2 Corollary on minimal CK 22 8.8.3 Corollary on PK minimality 22 8.9 PRIMARY KEY PROPERTIES (2) 22 8.9.1 Each PK component IS NOT NULL 22 8.9.2 Each PK component has a genuine value 22 8.10 PRIMARY KEY PROPERTIES (3) 22 8.10.1 Exactly one primary key 22 8.10.2 When designer creates the PK 22 8.10.3 Each entity has its own PK type 22 8.11 FOREIGN KEY 22 8.11.1 Properties of the foreign key 22 8.11.2 Examples of referencing entities 22 8.11.3 Example of referenced only entity 22 8.11.4 FK glues the schema 23 8.12 PRIMARY KEY PROPERTIES (4) 23 8.12.1 R.PK cannot be FK in R entity 23 8.12.2 R.PK as descriptive attribute of S 23 8.12.3 R.PK as part of S PK 23 8.12.4 R.PK as denotative attribute of S 23 2016-01-2412:51:18 Page 72 of 76

9.

DATABASE INTEGRITY 23 9.1 INITIAL TABLE LOAD 23 9.2 DATA SECURITY 23 9.3 THE DATUM 23 9.4 TABLE INTEGRITY BUTTONS 23 9.4.1 INSERT statement 23 9.4.2 UPDATE statement 23 9.4.3 DELETE statement 24 9.4.4 COMMIT statement 24 9.5 TRANSACTION CONCEPT 24 9.5.1 Transaction & database integrity 24 9.5.2 Transaction internals 24 9.6 RAS OF A DATABASE 24 9.7 MEAN TIME BETWEEN FAILURES 24 9.7.1 Table RAS 24 9.7.2 Database RAS 24 9.8 DATA INTEGRITY SCOPE 24 9.9 RM DATA INSPECTION 24 9.10 CODD'S TWELVE RULES OF 1985 24 9.11 RM DATA INTEGRITY 24 9.11.1 Information rule 25 9.11.2 Atomic datum coordinates 25 9.11.3 Comprehensive data sublanguage 25 9.12 RM INTEGRITY CONSTRAINTS 25 9.12.1 Entity integrity (primary key) 25 9.12.2 Referential integrity (foreign key) 25 9.12.3 Definable user integrity 25 9.12.4 Non-subversion rule 25 9.13 RM FACILITIES 25 9.13.1 View updating rule 25 9.13.2 Physical data independence 25 9.13.3 Logical data independence 25 9.13.4 Distribution transparency 25 9.14 RM HIGH QUALITY SERVICES 25 9.15 ON THE SYSTEMATIC TREATMENT OF NULL 25 10. DATABASE REDUNDANCY 25 10.1 REDUNDANCY TROUBLES 26 10.2 DATA LOST 26 10.2.1 Delete anomaly 26 10.3 DATA CORRUPTION 26 10.3.1 Update anomaly 26 10.4 UNEXPECTED INSERTION DELAY 26 10.4.1 Insert anomaly 26 10.5 FIRST CUT ON DATA REDUNDANCY 26 10.5.1 Beware of Greeks bearing gifts 26 10.6 REDUNDANCY EFFECTS ARE VACANT 26 10.7 DATA DESIGN CAUSING REDUNDANCY 26 10.7.1 Partial dependence 27 10.7.2 Transitive dependence 27 10.7.3 Internal dependence 27 10.8 SET OF SOPHISTICATED PREDICATES 27 10.8.1 Class predicate 27 10.8.2 Associative predicate 27 10.8.3 Denotative predicate 27 10.8.4 Kernel predicate 27 10.8.5 Characteristic predicate 27 10.9 BUSINESS INTEGRITY RULE 27 10.9.1 Failsafe redundancy 27 10.9.2 Failsafe irredundancy 27 10.10 3NF IS A NATURAL DATA DESIGN 27 10.10.1 3NF is the RM/T default 28 10.11 ZERO DEFECT DATA 28 10.12 IF {ZERO DEFECT DATA DESIGN} … 28 10.13 … THEN {ZERO DEFECT DATA} 28

E. Villar THIRD NORMAL FORM FUNDAMENTALS

10.14 3NF AND DBMS PERFORMANCE 28 10.15 OCCASIONALLY HOMER TAKES A NAP 28 10.16 NORMALLY HOMER IS HOMER 29 10.17 NORMALIZATION TRADE-OFFS MEME 29 10.17.1 A sample of normalization meme 29 11. IRREDUCIBLE UNF DATA MODEL 29 11.1 STORAGE REPRESENTATION DETAILS 29 11.2 DATA INTEGRITY & DATA DEPENDENCE 29 11.3 UNNORMALIZED RELATION 29 11.4 UNNORMAL FORM 29 11.5 UNNORMAL FORM FAMILY 29 11.5.1 Disk file physical organizations 30 11.5.2 Hierarchical structures 30 11.5.3 CODASYL structures 30 11.5.4 Object oriented structures 30 11.6 DATABASE UNF OBJECTS 30 11.6.1 1997 Database objects 30 11.6.2 1970 Database objects 30 11.7 UNNORMAL FORM PROTOCOL 30 11.7.1 UNF flat records are out of protocol 30 11.8 IRREDUCIBLE UNNORMAL FORM 30 11.8.1 Original UNF protocol credits 30 11.8.2 Class normalization in 3ONF 30 11.9 UNF RECORD UNIQUENESS 30 11.9.1 UNF ISN &1NF ROWID 30 11.10 UNF DBMS REDUCES REDUNDANCY 31 11.11 UNF DATABASE REDUNDANCY 31 11.11.1 UNF generic protocol 31 11.11.2 UNF partial dependences 31 11.11.3 UNF-DBMS FILE redundancy 31 11.12 UNF PK IS A MINIMAL COMPOSITION 31 11.13 FIRST NORMALIZATION 31 11.13.1 Normalization proceeds as follows. 31 11.14 RM PK IS A MINIMAL COMPOSITION 31 12. ZERO NORMAL FORMS 31 12.1 ZERO NORMAL FORM 32 12.1.1 Zero normal form protocol 32 12.2 R NAME INCLUDES A DATABASE VALUE 32 12.3 INFLATED PRIMARY KEY 32 12.3.1 UNF redundant ISAM index 32 12.3.2 Designated primary key 32 12.3.3 RM candidate keys 32 12.3.4 RM designated keys 32 12.3.5 Inflated candidate key 32 12.3.6 No minimal candidate key 32 12.3.7 Redundant candidate key 32 12.4 PK COMPONENT HAS NOT A GENUINE VALUE 32 12.4.1 Alternate design with genuine PK values 33 12.4.2 Genuine PK values with a surrogate 33 12.5 VECTOR OF ATTRIBUTES 33 12.6 RANGE OF ATTRIBUTES 33 12.6.1 First-cut in favor of range approach 33 12.6.2 RM Mongolian hordes 33 12.6.3 Interval datatype for temporal data 33 12.7 HORIZONTAL RELATION 33 12.7.1 Simple horizontal relation 33 12.7.2 Horizontal relation 33 12.7.3 Sophisticated horizontal relation (1) 34 12.7.4 Sophisticated horizontal relation (2) 34 12.7.5 Cubik normal form 34 12.8 CATALOG OF DESIGN PITFALLS 34 12.8.1 Constant column 34 12.8.2 Multiunit column 34 12.8.3 Enumerated attribute 35 2016-01-2412:51:18 Page 73 of 76

12.8.4 Micro-attribute 35 12.8.5 Metarow domain 35 12.8.6 Metadata row 35 12.8.7 Edited attribute 35 12.8.8 Hidden computed column 35 12.8.9 Multidomain column 35 12.8.10 Hidden UNF structure 35 12.8.11 Hidden derived table 35 13. DESIGNER RELATION 35 13.1 NEW RELATION STRUCTURES 35 13.2 THE THREE PK ASPECTS OF 1NF 35 13.2.1 1NF PK ignored 35 13.2.2 1NF minimal PK as 2NF fulcrum 35 13.2.3 1NF minimal PK with PD & TD 35 13.3 FIRST NORMAL FORM 36 13.3.1 InternalNF procedure 36 13.4 PRIMARY KEY (PK) 36 13.5 PREDICATING PART (DAß) 36 13.6 CANDIDATE KEY (CK) 36 13.7 PRIMARY DOMAIN 36 13.8 PRIMARY DOMAIN TRANSPARENCY 36 13.9 ALTERNATE KEY VANISHMENT 36 13.9.1 No room for alternate key specs 36 13.9.2 Occam razor's instance 36 13.9.3 PK-FK-only data model 36 13.10 AK DESCRIPTIVE FASHION 36 13.10.1 Denoting & sorting 36 13.11 MINOR-KEY DOMAIN (ĸ) 37 13.11.1 On AK as business constraint 37 13.11.2 On AK as one-to-one 37 13.11.3 On pure single AK 37 13.12 NONKEY DOMAIN (∼ĸ) 37 13.13 NONKEY DOMAIN IS REDUCIBLE 37 13.13.1 Minimal nonkey subset of R 37 13.13.2 Frozen part of nonkey domain 37 13.14 CODEPENDING VS. DENOTING 37 14. DATABASE NORMALIZATION 37 14.1 PARTIAL DEPENDENCE 37 14.2 WEAK PARTIAL DEPENDENCE 37 14.2.1 Weak redundancy coexists with PD 38 14.3 FULLY DEPENDENT 38 14.4 SECOND NORMAL FORM 38 14.4.1 2NF express 38 14.4.2 2NF formal definition 38 14.4.3 SecondNF procedure 38 14.5 R DEPENDENCY STRUCTURE 38 14.6 PRIMITIVE DEPENDENCY 39 14.6.1 Intransitive deep sentences of R 39 14.6.2 Transitive dependencies 39 14.6.3 Independent lines 39 14.6.4 Trivial lines 39 14.7 TRANSITIVELY DEPENDENT 39 14.8 WEAK TRANSITIVE REDUNDANCY 39 14.9 WEAK REDUNDANCY METHOD 39 14.9.1 Weak redundancy role 39 14.10 INTRANSITIVELY DEPENDENT 39 14.11 DETERMINATOR AXIOM 39 14.11.1 Determinator singularity (?) 39 14.12 ADJACENT DEPENDENT NODE 39 14.12.1 Intransitive structure>Transitive reduction 39 14.13 IMMEDIATELY DEPENDENT AND ADJACENT 39 14.14 THIRD NORMAL FORM 40 14.14.1 Optimal 3NF (1971) 40 14.14.2 Optimal 3NF (1974) 40 E. Villar THIRD NORMAL FORM FUNDAMENTALS

14.14.3 3NF express 40 14.14.4 ThirdNF procedure 40 14.14.5 FK optimizes transitive checks 40 15. THIRD NORMAL FORM 2.0 40 15.1 DESIGN DEFECTS INDEPENDENCY 40 15.2 REMINDER ON PRIMARY KEY 40 15.3 FIRST NORMAL FORM 2.0 40 15.4 SECOND NORMAL FORM 2.0 40 15.5 ALICE NORMAL FORM 2.0 40 15.6 THIRD NORMAL FORM 2.0 41 15.7 LOGICAL RECORD DESIGN 41 15.8 FIRST-CLASS BUSINESS MODEL 41 15.9 OVERLAPPING MIßES 41 15.9.1 A normal PK selection case 41 15.10 THIRD NORMAL FORM CATEGORIES 41 15.10.1 The four categories of 3NF tables 41 15.11 HEATH: 3NF CATEGORIES 41 15.11.1 3NF category 1 41 15.11.2 3NF category 2 41 15.11.3 3NF category 3 41 15.11.4 TNF category 4 41 15.11.5 Criticism on Heath's TNF category 4 42 15.12 3NF CATEGORY 4 42 15.12.1 Example of 3NF category 4 42 15.13 3NF CATEGORY 5 42 15.13.1 Example of 3NF category 5 42 15.14 RESUME OF 3NF CATEGORIES 42 16. DATA DESIGN REVIEW 42 16.1 THE ART OF DATABASE NORMALIZING 42 16.2 DATA REVIEW USING FD 43 16.2.1 Semantic interpretation 43 16.2.2 Dependency substructures 43 16.2.3 Substructures representation 43 16.2.4 Cohesive database predicate 43 16.3 NORMALIZATION OF A LEGACY SCHEMA 43 16.4 NORMALIZATION OF A TABLE 43 16.5 NORMALIZATION SUMMARY 43 16.6 NORMAL PROC 43 16.7 OVERVIEW OF NORMALIZATION 44 16.7.1 Normalization prolog 44 16.7.2 FirstNF method 44 16.7.3 SecondNF method 44 16.7.4 ThirdNF method 44 16.7.5 Normalization epilog 44 16.8 PROLOG OF DATA REVIEW 44 16.9 ACTIVE FIRST NORMAL FORM 44 16.9.1 Internal dependence 44 16.9.2 Internal redundancy 44 16.9.3 Internal refactoring 44 16.9.4 ID semantical interpretation 45 16.10 SECOND NORMAL FORM 45 16.10.1 Partial refactoring 45 16.10.2 PD semantical interpretation 45 16.11 THIRD NORMAL FORM 45 16.11.1 Transitive refactoring 45 16.11.2 TD semantical interpretation 45 16.12 EPILOG OF DATA REVIEW 45 17. HOLISTIC DATA DESIGN 45 17.1 3NF ON DATABASE STRUCTURES 45 17.2 MAIN CLASSES OF 3NF PREDICATES 45 17.2.1 Novo days classes of 3NF patterns 46 17.3 INTERNET 3NF DATABASE CHALLENGE 46 17.4 ᾹSSOCIATIVE PATTERNS 46 17.4.1 Minimal class Ᾱ 46 2016-01-2412:51:18 Page 74 of 76

17.4.2 (A) Associative table 46 17.4.3 (AR) Reflexive table 46 17.4.4 Quine triadic association 46 17.5 ℬREAD & BUTTER PATTERNS 46 17.5.1 (K) Kernel pattern 46 17.5.2 Descriptive family 46 17.5.3 Characteristic table pattern 46 17.5.4 (G) Grouping table pattern 46 17.6 CHARACTERISTIC PRIMARY KEY 47 17.6.1 Tag name 47 17.6.2 Definite denotation 47 17.6.3 Composite tag 47 17.6.4 Recursive tag 47 17.7 ℂLUB & HEATH PATTERN 47 17.7.1 One-to-one denotative pattern 47 17.8 ⅅ: {ℬREAD &ℂLUB UNION} PATTERNS 47 17.8.1 Embedded 1:1 pattern 47 17.8.2 Denotative table 47 17.8.3 Molecular relation 47 17.9 ℬ⁻¹: (DIMENSIONAL PATTERN) 47 17.9.1 3NF Class B⁻¹ 47 17.9.2 DW multidimensional schema 48 17.9.3 Sixth Normal Form & 3NF Class B⁻¹ 48 17.10 PALETTE OF PRIMARY KEYS 48 17.11 EACH ENTITY TYPE, ITS 3NF PREDICATE 48 18. LEGO® DATA DESIGN 48 18.1 LEGO® DATA DESIGN CREDITS 48 18.2 ERA OBJECTS 48 18.3 3NF DECALOGUE 49 18.4 ENTITY DEFINITION 49 18.4.1 Entity predicate 49 18.4.2 Entity property 49 18.4.3 Pertinent attribute 49 18.4.4 Entity notation 49 18.5 PROPERTY DEFINITION 49 18.6 AXIOM 0: ENTITY NAME IS NEUTER 49 18.7 AXIOM 1: EACH ATTRIBUTE ITS SEMANTIC 49 18.7.1 An attribute is irredundant 49 18.8 AXIOM 2: ATTRIBUTE IS NOT NULL 49 18.9 AXIOM 3: EACH ENTITY ITS ATTRIBUTE SET 49 18.10 (AXIOM 3)⁻¹: EACH ATTRIBUTE ITS ENTITY 49 18.11 AXIOM 4: AN STANDALONE IDENTIFIER PER ENTITY 50 18.12 (AXIOM 4)⁻¹: ONLY ONE ENTITY IDENTIFIER 50 18.13 AXIOM 5: MINIMAL IDENTIFIER 50 18.14 AXIOM 6: FK WITH EXPLICIT SPECS 50 18.15 AXIOM 7: MAXIMIZE DESCRIPTIVE FK'S 50 18.16 AXIOM 8: ONLY A SINGLE PK CAN BE FK 50 18.17 AXIOM 9: MINIMAL NONKEY DOMAIN 51 19. EXTREME NORMALIZATION 51 19.1 OVERVIEW OF EXTREME NORMALIZATION 51 19.1.1 Quine normal form specialty 51 19.1.2 Schema normalization 51 19.2 EXTREME NORMALIZATION MENTAL MAP 52 19.3 EXTREME DESIGN CREDITS 52 19.4 UNSURPASSABLE THIRD NORMAL FORM 52 19.4.1 Some irreducible third normal forms 52 19.5 THESIS OF EXTREME NORMALIZATION 52 20. QUINE NORMAL FORM 52 20.1 ORDERED PAIR 52 20.2 QUINE THEORY ON POLYADIC RELATIONS 53 20.3 3NF TETRADIC EXAMPLE 53 20.4 QNF POLYADIC ANALYSIS 53 20.5 HIDDEN DERIVABLE ASSOCIATION 53 20.6 QNF ASSOCIATIVE METHOD 53 E. Villar THIRD NORMAL FORM FUNDAMENTALS

20.7 THE SELECTED BINARY SEED 54 20.8 SELECTING THE FIRST REFERENT 54 20.9 ADDING THE LAST REFERENT 54 20.10 CHECKING QNF COMPOSITION 55 20.11 QUINE NORMAL FORM OF KACP TABLE 55 20.11.1 KACPS database schema 55 20.11.2 SQL VIEW of original tabulation 55 21. ONTOLOGIC NORMAL FORM 55 21.1 ORPHAN ENTITY 55 21.2 ONTOLOGIC NORMAL FORM 55 21.3 ORPHAN PK COMPONENT 55 21.4 ONTOLOGIC NORMAL FORM (K) 55 21.5 ANSELMIAN PRIVILEGE 55 22. INCLUSION NORMAL FORM 56 22.1 INCLUSION DEPENDENCY 56 22.2 INCLUSION DEPENDENCY VS. SPECIALIZATION 56 22.3 INCLUSION DEPENDENCE 56 22.4 INCLUSION DEPENDENCE AS ERA CASE 56 22.5 INCLUSION DEPENDENCE AS RM CASE 56 22.5.1 Create 3NF inclusion schema 56 22.6 INCLUSION NORMAL FORM 56 23. UNMARKED NORMAL FORM 56 23.1 NORMALIZING EVERY MARK SPECS 57 23.2 ATTRIBUTE IS NULL 57 23.2.1 Missing value impairs FD analysis 57 23.2.2 Third normal form apply to value columns 57 23.2.3 Holistic data design assumes value columns 57 23.3 ATTRIBUTE IS NOT NULL WITH DEFAULT 57 23.4 UNMARKED NORMAL FORM IN A NUTSHELL 57 23.4.1 Every SQL column IS NOT NULL 57 23.5 CREATE MOLECULE 57 23.5.1 EMPLOYEE tabulation 58 23.5.2 Many i-marks per column 58 23.5.3 Many default values per column 58 23.6 UNMARKED NORMAL FORM 58 24. LACONIC NORMAL FORM 58 24.1 CREATE AMBIGUOUS MOLECULE 59 24.1.1 Ambiguous tabulation 59 24.2 LACONIC NORMAL FORM 59 24.2.1 Laconic family 59 25. OCCAM NORMAL FORM 59 25.1 CREATE PLURAL SCHEMA 59 25.1.1 Plural schema tabulations 60 25.2 OCCAM’S RAZOR 60 25.3 OCCAM NORMAL FORM 60 25.3.1 Occam schema tabulations 60 26. UNION NORMAL FORM 60 26.1 HORIZONTALLY PARTITIONED DATABASE TABLE 60 26.2 UNION NORMAL FORM 60 26.3 TRANSPARENT DISTRIBUTION 61 27. NATURAL NORMAL FORM 61 27.1 VERTICAL DATABASE PARTITION 61 27.2 NATURAL NORMAL FORM 61 27.3 NATURAL NORMAL FORM (PHASE 1) 61 27.4 NATURAL NORMAL FORM (PHASE 2) 61 27.5 TRANSPARENT VERTICAL PARTITION 61

2016-01-2412:51:18 Page 75 of 76

28. DATABASE OPTIMIZATION 61 28.1 INTEGRATING THE 3NF MICRO-SCHEMAS 62 28.2 DIFFERENT TYPES OF DATABASE REDUNDANCY 62 28.2.1 Strong redundancy of R 62 28.2.2 Weak redundancy of R 62 28.3 HIDDEN REDUNDANCY 62 28.4 HIDDEN FUNCTIONAL DEPENDENCY 62 28.5 NORMAL STEPWISE REFINEMENT 62 28.6 OPTIMAL DATABASE SCHEMA 62 28.7 OPTIMAL THIRD NORMAL FORM 62 28.8 OPTIMAL 3NF SCHEMA 62 28.8.1 Free of strong redundancy 63 28.8.2 Free of weak redundancy 63 28.8.3 Free of hidden redundancy 63 31.1 DATA MODELLING 66 31.2 SYNOPSYS OF DATABASE TERMS 66 31.2.1 Entity type 66 31.2.2 Row cardinality & derived relation 66 31.2.3 Domain 66 31.2.4 Keys 66 31.2.5 Designer entities 66 31.2.6 Complex designer entities 66 31.3 PM IDENTITY 66 31.3.1 Well-formed formula 66 31.3.2 PM Identity definition 66 31.4 GLOSSARY OF PREDICATE TERMS 67 31.4.1 Predicates 67 31.4.2 Predicate Quantifiers 67 31.5 TRUE TABLES 67 31.5.1 TRUE table of AND 67 31.5.2 TRUE table of OR 67 31.5.3 TRUE table of IMPLIES 67 31.5.4 TRUE table of NOR 67 31.5.5 TRUE table of MINUS 67 31.6 LOGIC OF CLASSES 67 31.6.1 ElementOf and singular classes 67 31.6.2 Inclusion predicates 67 31.6.3 Boolean classes 67 31.6.4 Set predicates 67 31.6.5 Complex classes 68 31.7 FORMULAS AS SETS 68 31.7.1 Empty Set and Complement 68 31.7.2 Boolean Algebra Rules 68 31.8 FORMULAS AS RELATIONS 68 31.8.1 'binary relation' IsSynonymOf 'xRy' 68 31.8.2 Functional Dependency is a binary relation 68 31.8.3 Meaning of the set A MINUS B 68 31.8.4 Meaning of A NOR B 68 31.9 DATABASE MAYBE LOGIC 68 31.9.1 MAYBE identity 68 31.9.2 TRUE table of maybe-OR 69 31.9.3 TRUE table of maybe-AND 69 31.9.4 On three-valued logic 69 31.9.5 On MAYBE logic for the r-model 69 31.9.6 MAYBE query logic 69 31.10 THE THREE LEVELS OF R ARCHITECTURE 69 31.10.1 Conceptual, Logical and Physical designs 70 31.10.2 Desiderata on deliverables diagramming 70 31.11 R-DBMS PERFORMANCE ORIENTED DUTIES 70

E. Villar THIRD NORMAL FORM FUNDAMENTALS

2016-01-2412:51:18 Page 76 of 76

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.