forked from w3ctag/webarch
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
4380 lines (4380 loc) · 203 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta content="width=device-width,initial-scale=1" name="viewport">
<title>
Architecture of the World Wide Web (Second Edition)
</title>
<script class="remove" src=
"http://www.w3.org/Tools/respec/respec-w3c-common">
</script>
<script class="remove">
var respecConfig = {
specStatus: "ED",
shortName: "webarch",
//publishDate: "2012-12-12",
//previousPublishDate: "",
edDraftURI: "http://w3ctag.github.io/webarch/",
// lcEnd: "3000-01-01",
// crEnd: "3000-01-01",
editors: [{
name: "Ian Jacobs",
company: "W3C",
companyURL: "http://www.w3.org/",
note: "First Edition"
},
{
name: "Henry S. Thompson",
company: "University of Edinburgh",
companyURL: "http://www.inf.ed.ac.uk/",
note: "Second Edition"
}
],
wg: "Technical Architecture Group",
wgURI: "http://www.w3.org/2001/tag/",
wgPublicList: "www-tag",
wgPatentURI: "http://www.w3.org/2001/tag/disclosures",
otherLinks: [{
key: "Repository",
data: [{
value: "We are on Github.",
href: "https://github.com/w3ctag/webarch"
}, {
value: "File a bug.",
href: "https://github.com/w3ctag/webarch/issues"
}, {
value: "Commit history.",
href: "https://github.com/w3ctag/webarch/commits/gh-pages"
}
]
}
],
inlineCSS: true,
noIDLIn: true,
noLegacyStyle: false,
extraCSS: ["../ReSpec.js/css/respec.css"],
};
</script>
</head>
<body>
<section id="abstract">
<p>
The World Wide Web uses relatively simple technologies with sufficient
scalability, efficiency and utility that they have resulted in a
remarkable information space of interrelated resources, growing across
languages, cultures, and media. In an effort to preserve these
properties of the information space as the technologies evolve, this
architecture document discusses the core design components of the Web.
They are identification of resources, representation of resource state,
and the protocols that support the interaction between agents and
resources in the space. We relate core design components, constraints,
and good practices to the principles and properties they support.
</p>
</section>
<section id="sotd">
<p>
This is an unofficial draft and work in progress. It has no official
standing.
</p>
<p>
<em>This section describes the status of this document at the time of
its publication. Other documents may supersede this document. A list of
current W3C publications and the latest revision of this technical
report can be found in the <a href="http://www.w3.org/TR/">W3C
technical reports index</a> at http://www.w3.org/TR/.</em>
</p>
<p>
This document has been developed by W3C's <a href=
"http://www.w3.org/2001/tag/">Technical Architecture Group (TAG)</a>,
which, by <a href="http://www.w3.org/2001/07/19-tag">charter</a>
maintains a <a href="http://www.w3.org/2001/tag/issues.html">list of
architectural issues</a>. The scope of this document is a useful subset
of those issues; it is not intended to address all of them. The TAG
intends to address the remaining (and future) issues after publication
of Volume Two as a Recommendation.
</p>
<p>
This document uses concepts and terms regarding URIs as defined by the
IETF. In an <a href=
"http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg00602.html">
18 Oct 2004 announcement</a>, the revision of RFC2396 was endorsed as
an IETF Specification, though the latest published draft as of this
writing is <a href=
"http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html">draft-fielding-uri-rfc2396bis-07</a>.
The <a href="#URI">[URI]</a> citation should reflect publication of the
relevant RFC in future revisions.
</p>
</section>
<section id="principles">
<h3>
List of Principles, Constraints, and Good Practice Notes
</h3>
<p>
The following principles, constraints, and good practice notes are
discussed in this document and listed here for convenience. There is
also a <a href="summary.html">free-standing summary</a>.
</p>
<dl>
<dt>
Identification
</dt>
<dd>
<ul>
<li>
<a href="#pr-global-id">Global Identifiers</a> (principle, 2)
</li>
<li>
<a>Identify with URIs</a> (practice, 2.1)
</li>
<li>
<a>URIs Identify a Single Resource</a> (constraint, 2.2)
</li>
<li>
<a>Avoiding URI aliases</a> (practice, 2.3.1)
</li>
<li>
<a>Consistent URI usage</a> (practice, 2.3.1)
</li>
<li>
<a>Reuse URI schemes</a> (practice, 2.4)
</li>
<li>
<a>URI opacity</a> (practice, 2.5)
</li>
</ul>
</dd>
<dt>
Interaction
</dt>
<dd>
<ul>
<li>
<a>Reuse representation formats</a> (practice, 3.2)
</li>
<li>
<a>Data-metadata inconsistency</a> (constraint, 3.3)
</li>
<li>
<a>Metadata association</a> (practice, 3.3)
</li>
<li>
<a>Safe retrieval</a> (principle, 3.4)
</li>
<li>
<a>Available representation</a> (practice, 3.5)
</li>
<li>
<a>Reference does not imply dereference</a> (principle, 3.5)
</li>
<li>
<a>Consistent representation</a> (practice, 3.5.1)
</li>
</ul>
</dd>
<dt>
Data Formats
</dt>
<dd>
<ul>
<li>
<a>Version information</a> (practice, 4.2.1)
</li>
<li>
<a>Namespace policy</a> (practice, 4.2.2)
</li>
<li>
<a>Extensibility mechanisms</a> (practice, 4.2.3)
</li>
<li>
<a>Extensibility conformance</a> (practice, 4.2.3)
</li>
<li>
<a>Unknown extensions</a> (practice, 4.2.3)
</li>
<li>
<a>Separation of content, presentation, interaction</a>
(practice, 4.3)
</li>
<li>
<a>Link identification</a> (practice, 4.4)
</li>
<li>
<a>Web linking</a> (practice, 4.4)
</li>
<li>
<a>Generic URIs</a> (practice, 4.4)
</li>
<li>
<a>Hypertext links</a> (practice, 4.4)
</li>
<li>
<a>Namespace adoption</a> (practice, 4.5.3)
</li>
<li>
<a>Namespace documents</a> (practice, 4.5.4)
</li>
<li>
<a>QNames Indistinguishable from URIs</a> (constraint, 4.5.5)
</li>
<li>
<a>QName Mapping</a> (practice, 4.5.5)
</li>
<li>
<a>XML and "text/*"</a> (practice, 4.5.7)
</li>
<li>
<a>XML and character encodings</a> (practice, 4.5.7)
</li>
</ul>
</dd>
<dt>
General Architecture Principles
</dt>
<dd>
<ul>
<li>
<a>Orthogonality</a> (principle, 5.1)
</li>
<li>
<a>Error recovery</a> (principle, 5.3)
</li>
</ul>
</dd>
</dl>
</section>
<hr>
<section id="intro">
<h2>
Introduction
</h2>
<p>
The <dfn>World Wide Web</dfn> (<dfn><abbr>WWW</abbr></dfn>, or simply
<dfn><abbr>Web</abbr></dfn>) is an information space in which the items
of interest, referred to as resources, are identified by global
identifiers called Uniform Resource Identifiers
(<dfn><abbr>URI</abbr></dfn>).
</p>
<p>
Examples such as the following <dfn>travel scenario</dfn> are used
throughout this document to illustrate typical behavior of <dfn>Web
agents</dfn>—people or software acting on this information space. A
<dfn>user agent</dfn> acts on behalf of a user. Software agents include
servers, proxies, spiders, browsers, and multimedia players.
</p>
<div class="boxedtext">
<p>
<span class="storylab">Story</span>
</p>
<div class="story">
<p>
While planning a trip to Mexico, Nadia reads “Oaxaca weather
information: 'http://weather.example.com/oaxaca'” in a glossy
travel magazine. Nadia has enough experience with the Web to
recognize that "http://weather.example.com/oaxaca" is a URI and
that she is likely to be able to retrieve associated information
with her Web browser. When Nadia enters the URI into her browser:
</p>
<ol>
<li>The browser recognizes that what Nadia typed is a URI.
</li>
<li>The browser performs an information retrieval action in
accordance with its configured behavior for resources identified
via the "http" URI scheme.
</li>
<li>The authority responsible for "weather.example.com" provides
information in a response to the retrieval request.
</li>
<li>The browser interprets the response, identified as XHTML by the
server, and performs additional retrieval actions for inline
graphics and other content as necessary.
</li>
<li>The browser displays the retrieved information, which includes
hypertext links to other information. Nadia can follow these
hypertext links to retrieve additional information.
</li>
</ol>
</div>
</div>
<p>
This scenario illustrates the three architectural bases of the Web that
are discussed in this document:
</p>
<ol>
<li>
<p>
<a>Identification</a>. URIs are used to identify resources. In this
travel scenario, the resource is a periodically updated report on
the weather in Oaxaca, and the URI is
“http://weather.example.com/oaxaca”.
</p>
</li>
<li>
<p>
<a>Interaction</a>. Web agents communicate using standardized
protocols that enable interaction through the exchange of messages
which adhere to a defined syntax and semantics. By entering a URI
into a retrieval dialog or selecting a hypertext link, Nadia tells
her browser to perform a retrieval action for the resource
identified by the URI. In this example, the browser sends an HTTP
GET request (part of the HTTP protocol) to the server at
"weather.example.com", via TCP/IP port 80, and the server sends
back a message containing what it determines to be a representation
of the resource as of the time that representation was generated.
Note that this example is specific to hypertext browsing of
information—other kinds of interaction are possible, both within
browsers and through the use of other types of Web agent; our
example is intended to illustrate one common interaction, not
define the range of possible interactions or limit the ways in
which agents might use the Web.
</p>
</li>
<li>
<p>
<a>Formats</a>. Most protocols used for representation retrieval
and/or submission make use of a sequence of one or more messages,
which taken together contain a payload of representation data and
metadata, to transfer the representation between agents. The choice
of interaction protocol places limits on the formats of
representation data and metadata that can be transmitted. HTTP, for
example, typically transmits a single octet stream plus metadata,
and uses the "Content-Type" and "Content-Encoding" header fields to
further identify the format of the representation. In this
scenario, the representation transferred is in XHTML, as identified
by the "Content-type" HTTP header field containing the registered
Internet media type name, "application/xhtml+xml". That Internet
media type name indicates that the representation data can be
processed according to the XHTML specification.
</p>
<p>
Nadia's browser is configured and programmed to interpret the
receipt of an "application/xhtml+xml" typed representation as an
instruction to render the content of that representation according
to the XHTML rendering model, including any subsidiary interactions
(such as requests for external style sheets or in-line images)
called for by the representation. In the scenario, the XHTML
representation data received from the initial request instructs
Nadia's browser to also retrieve and render in-line the weather
maps, each identified by a URI and thus causing an additional
retrieval action, resulting in additional representations that are
processed by the browser according to their own data formats (e.g.,
"application/svg+xml" indicates the SVG data format), and this
process continues until all of the data formats have been rendered.
The result of all of this processing, once the browser has reached
an application steady-state that completes Nadia's initial
requested action, is commonly referred to as a "Web page".
</p>
</li>
</ol>
<p>
The following illustration shows the relationship between identifier,
resource, and representation.
</p>
<figure>
<img alt=
"A resource (Oaxaca Weather Info) is identified by a particular URI and is represented by pseudo-HTML content"
src="images/uri-res-rep.png">
<figcaption>
In the remainder of this document, we highlight important
architectural points regarding Web identifiers, protocols, and
formats. We also discuss some important <a>general architectural
principles</a> and how they apply to the Web.
</figcaption>
</figure>
<section id="about">
<h3>
About this Document
</h3>
<p>
This document describes the properties we desire of the Web and the
design choices that have been made to achieve them. It promotes the
reuse of existing standards when suitable, and gives guidance on how
to innovate in a manner consistent with Web architecture.
</p>
<p>
The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used in the
principles, constraints, and good practice notes in accordance with
RFC 2119 [[!RFC2119]].
</p>
<p>
This document does not include conformance provisions for these
reasons:
</p>
<ul>
<li>Conforming software is expected to be so diverse that it would
not be useful to be able to refer to the class of conforming software
agents.
</li>
<li>Some of the good practice notes concern people; specifications
generally define conformance for software, not people.
</li>
<li>We do not believe that the addition of a conformance section is
likely to increase the utility of the document.
</li>
</ul>
<section>
<h4>
Audience of this Document
</h4>
<p>
This document is intended to inform discussions about issues of Web
architecture. The intended audience for this document includes:
</p>
<ol>
<li>Participants in W3C Activities
</li>
<li>Other groups and individuals designing technologies to be
integrated into the Web
</li>
<li>Implementers of W3C specifications
</li>
<li>Web content authors and publishers
</li>
</ol>
<p>
<strong>Note:</strong> This document does not distinguish in any
formal way the terms "language" and "format." Context determines
which term is used. The phrase "specification designer" encompasses
language, format, and protocol designers.
</p>
</section>
<section>
<h4>
Scope of this Document
</h4>
<p>
This document presents the general architecture of the Web. Other
groups inside and outside W3C also address specialized aspects of
Web architecture, including accessibility, quality assurance,
internationalization, device independence, and Web Services. The
section on <a>Architectural Specifications</a> includes references
to these related specifications.
</p>
<p>
This document strives for a balance between brevity and precision
while including illustrative examples. <a href=
"http://www.w3.org/2001/tag/findings">TAG findings</a> are
informational documents that complement the current document by
providing more detail about selected topics. This document includes
some excerpts from the findings. Since the findings evolve
independently, this document includes references to approved TAG
findings. For other TAG issues covered by this document but without
an approved finding, references are to entries in the <a href=
"http://www.w3.org/2001/tag/issues.html">TAG issues list</a>.
</p>
<p>
Many of the examples in this document that involve human activity
suppose the familiar Web interaction model (illustrated at the
beginning of the Introduction) where a person follows a link via a
user agent, the user agent retrieves and presents data, the user
follows another link, etc. This document does not discuss in any
detail other interaction models such as voice browsing (see, for
example, [[!VOICEXML2]]). The choice of interaction model may have
an impact on expected agent behavior. For instance, when a
graphical user agent running on a laptop computer or hand-held
device encounters an error, the user agent can report errors
directly to the user through visual and audio cues, and present the
user with options for resolving the errors. On the other hand, when
someone is browsing the Web through voice input and audio-only
output, stopping the dialog to wait for user input may reduce
usability since it is so easy to "lose one's place" when browsing
with only audio-output. This document does not discuss how the
principles, constraints, and good practices identified here apply
in all interaction contexts.
</p>
</section>
<section>
<h4>
Principles, Constraints, and Good Practice Notes
</h4>
<p>
The important points of this document are categorized as follows:
</p>
<dl>
<dt>
Principle
</dt>
<dd>
An architectural principle is a fundamental rule that applies to
a large number of situations and variables. Architectural
principles include "separation of concerns", "generic interface",
"self-descriptive syntax," "visible semantics," "network effect"
(Metcalfe's Law), and Amdahl's Law: "The speed of a system is
limited by its slowest component."
</dd>
<dt>
Constraint
</dt>
<dd>
In the design of the Web, some choices, like the names of the
<code>p</code> and <code>li</code> elements in HTML, the choice
of the colon (:) character in URIs, or grouping bits into
eight-bit units (octets), are somewhat arbitrary; if
<code>paragraph</code> had been chosen instead of <code>p</code>
or asterisk (*) instead of colon, the large-scale result would,
most likely, have been the same. This document focuses on more
fundamental design choices: design choices that lead to
constraints, i.e., restrictions in behavior or interaction within
the system. Constraints may be imposed for technical, policy, or
other reasons to achieve desirable properties in the system, such
as accessibility, global scope, relative ease of evolution,
efficiency, and dynamic extensibility.
</dd>
<dt>
Good practice
</dt>
<dd>
Good practice—by software developers, content authors, site
managers, users, and specification designers—increases the value
of the Web.
</dd>
</dl>
</section>
</section>
</section>
<section id="identification">
<h2>
Identification
</h2>
<p>
In order to communicate internally, a community agrees (to a reasonable
extent) on a set of terms and their meanings. One goal of the Web,
since its inception, has been to build a global community in which any
party can share information with any other party. To achieve this goal,
the Web makes use of a single global identification system: the URI.
URIs are a cornerstone of Web architecture, providing identification
that is common across the Web. The global scope of URIs promotes
large-scale "network effects": the value of an identifier increases the
more it is used consistently (for example, the more it is used in
<a>hypertext links</a>).
</p>
<div class="boxedtext">
<p>
<span class="principlelab">Principle: <a id="pr-global-id">Global
Identifiers</a></span>
</p>
<p class="principle">
Global naming leads to global network effects.
</p>
</div>
<p>
This principle dates back at least as far as Douglas Engelbart's
seminal work on open hypertext systems; see section <a href=
"http://www.bootstrap.org/augdocs/augment-132082.htm#11K">Every Object
Addressable</a> in [[!Eng90]].
</p>
<section id="uri-benefits">
<h3>
Benefits of URIs
</h3>
<p>
The choice of syntax for global identifiers is somewhat arbitrary; it
is their global scope that is important. The <dfn>Uniform Resource
Identifier</dfn>, [[!URI]], has been successfully deployed since the
creation of the Web. There are substantial benefits to participating
in the existing network of URIs, including linking, bookmarking,
caching, and indexing by search engines, and there are substantial
costs to creating a new identification system that has the same
properties as URIs.
</p>
<div class="boxedtext">
<p>
<span class="practicelab">Good practice: <a id=
"pr-use-uris">Identify with URIs</a></span>
</p>
<p class="practice">
To benefit from and increase the value of the World Wide Web,
agents should provide URIs as identifiers for resources.
</p>
</div>
<p>
A resource should have an associated URI if another party might
reasonably want to create a hypertext link to it, make or refute
assertions about it, retrieve or cache a representation of it,
include all or part of it by reference into another representation,
annotate it, or perform other operations on it. Software developers
should expect that sharing URIs across applications will be useful,
even if that utility is not initially evident. The TAG finding
<cite>"<a href=
"http://www.w3.org/2001/tag/doc/whenToUseGet.html">URIs,
Addressability, and the use of HTTP GET and POST</a>"</cite>
discusses additional benefits and considerations of URI
addressability.
</p>
<p>
<strong>Note:</strong> Some URI schemes (such as the "ftp" URI scheme
specification) use the term "designate" where this document uses
"identify."
</p>
</section>
<section id="id-resources">
<h3>
URI/Resource Relationships
</h3>
<p>
By design a URI identifies one resource. We do not limit the scope of
what might be a <dfn>resource</dfn>. The term "resource" is used in a
general sense for whatever might be identified by a URI. It is
conventional on the hypertext Web to describe Web pages, images,
product catalogs, etc. as “resources”. The distinguishing
characteristic of these resources is that all of their essential
characteristics can be conveyed in a message. We identify this set as
“<dfn>information resources</dfn>.”
</p>
<p>
This document is an example of an information resource. It consists
of words and punctuation symbols and graphics and other artifacts
that can be encoded, with varying degrees of fidelity, into a
sequence of bits. There is nothing about the essential information
content of this document that cannot in principle be transfered in a
message. In the case of this document, the message payload is the
<a>representation</a> of this document.
</p>
<p>
However, our use of the term resource is intentionally more broad.
Other things, such as cars and dogs (and, if you've printed this
document on physical sheets of paper, the artifact that you are
holding in your hand), are resources too. They are not information
resources, however, because their essence is not information.
Although it is possible to describe a great many things about a car
or a dog in a sequence of bits, the sum of those things will
invariably be an approximation of the essential character of the
resource.
</p>
<p>
We define the term “information resource” because we observe that it
is useful in discussions of Web technology and may be useful in
constructing specifications for facilities built for use on the Web.
</p>
<div class="boxedtext">
<p>
<span class="constraintlab">Constraint: <a id=
"pr-uri-collision">URIs Identify a Single Resource</a></span>
</p>
<p class="constraint">
Assign distinct URIs to distinct resources.
</p>
</div>
<p>
Since the scope of a URI is global, the resource identified by a URI
does not depend on the context in which the URI appears (see also the
section about <a class="section">indirect identification</a>).
</p>
<p>
[[!URI]] is an agreement about how the Internet community allocates
names and associates them with the resources they identify. URIs are
divided into <a>schemes</a> that define, via their scheme
specification, the mechanism by which scheme-specific identifiers are
associated with resources. For example, the "http" URI scheme
([[!RFC2616]]) uses DNS and TCP-based HTTP servers for the purpose of
identifier allocation and resolution. As a result, identifiers such
as "http://example.com/somepath#someFrag" often take on meaning
through the community experience of performing an HTTP GET request on
the identifier and, if given a successful response, interpreting the
response as a representation of the identified resource. (See also
<a>Fragment Identifiers</a>.) Of course, a retrieval action like GET
is not the only way to obtain information about a resource. One might
also publish a document that purports to define the meaning of a
particular URI. These other sources of information may suggest
meanings for such identifiers, but it's a local policy decision
whether those suggestions should be heeded.
</p>
<p>
Just as one might wish to refer to a person by different names (by
full name, first name only, sports nickname, romantic nickname, and
so forth), Web architecture allows the association of more than one
URI with a resource. URIs that identify the same resource are called
<dfn>URI aliases</dfn>. The section on <a>URI aliases</a> discusses
some of the potential costs of creating multiple URIs for the same
resource.
</p>
<p>
Several sections of this document address questions about the
relationship between URIs and resources, including:
</p>
<ul>
<li>How much can I tell about a resource by inspection of a URI that
identifies it? See the sections on <a>URI schemes</a> and <a>URI
opacity</a>.
</li>
<li>Who determines what resource a URI identifies? See the section on
<a>URI allocation</a>.
</li>
<li>Can the resource identified by a URI change over time? See the
sections on <a>URI persistence</a> and <a>representation
management</a> .
</li>
<li>Since more than one URI can identify the same resource, how do I
know which URIs identify the same resource? See the sections on
<a>URI comparison</a> and <a>assertions that two URIs identify the
same resource</a>.
</li>
</ul>
<section>
<h4 id="URI-collision">
URI collision
</h4>
<p>
By design, a URI identifies one resource. Using the same URI to
directly identify different resources produces a <dfn>URI
collision</dfn>. Collision often imposes a cost in communication
due to the effort required to resolve ambiguities.
</p>
<p>
Suppose, for example, that one organization makes use of a URI to
refer to the movie <cite>The Sting</cite>, and another organization
uses the same URI to refer to a discussion forum about <cite>The
Sting</cite>. To a third party, aware of both organizations, this
collision creates confusion about what the URI identifies,
undermining the value of the URI. If one wanted to talk about the
creation date of the resource identified by the URI, for instance,
it would not be clear whether this meant "when the movie was
created" or "when the discussion forum about the movie was
created."
</p>
<p>
Social and technical solutions have been devised to help avoid URI
collision. However, the success or failure of these different
approaches depends on the extent to which there is consensus in the
Internet community on abiding by the defining specifications.
</p>
<p>
The section on <a class="addrefnb">URI allocation</a> examines
approaches for establishing the authoritative source of information
about what resource a URI identifies.
</p>
<p>
URIs are sometimes used for <a>indirect identification</a>. This
does not necessarily lead to collisions.
</p>
</section>
<section id="uri-assignment">
<h4>
URI allocation
</h4>
<p>
URI allocation is the process of associating a URI with a resource.
Allocation can be performed both by resource owners and by other
parties. It is important to avoid <a>URI collision</a>.
</p>
<section id="uri-ownership">
<h5>
URI ownership
</h5>
<p>
<dfn>URI ownership</dfn> is a relation between a URI and a social
entity, such as a person, organization, or specification. URI
ownership gives the relevant social entity certain rights,
including:
</p>
<ol>
<li>to pass on ownership of some or all owned URIs to another
owner—delegation; and
</li>
<li>to associate a resource with an owned URI—URI allocation.
</li>
</ol>
<p>
By social convention, URI ownership is delegated from the IANA
URI scheme registry [[!IANASchemes]], itself a social entity, to
IANA-registered URI scheme specifications. Some URI scheme
specifications further delegate ownership to subordinate
registries or to other nominated owners, who may further delegate
ownership. In the case of a specification, ownership ultimately
lies with the community that maintains the specification.
</p>
<p>
The approach taken for the "http" URI scheme, for example,
follows the pattern whereby the Internet community delegates
authority, via the IANA URI scheme registry and the DNS, over a
set of URIs with a common prefix to one particular owner. One
consequence of this approach is the Web's heavy reliance on the
central DNS registry. A different approach is taken by the URN
Syntax scheme [[!RFC2141] which delegates ownership of portions
of URN space to URN Namespace specifications which themselves are
registered in an IANA-maintained registry of URN Namespace
Identifiers.
</p>
<p>
URI owners are responsible for avoiding the assignment of
equivalent URIs to multiple resources. Thus, if a URI scheme
specification does provide for the delegation of individual or
organized sets of URIs, it should take pains to ensure that
ownership ultimately resides in the hands of a single social
entity. Allowing multiple owners increases the likelihood of URI
collisions.
</p>
<p>
URI owners may organize or deploy infrastruture to ensure that
representations of associated resources are available and, where
appropriate, interaction with the resource is possible through
the exchange of representations. There are social expectations
for responsible <a class="addrefnb">representation management</a>
by URI owners. Additional social implications of URI ownership
are not discussed here.
</p>
<p>
See TAG issue <a href=
"http://www.w3.org/2001/tag/issues.html#siteData-36">siteData-36</a>,
which concerns the expropriation of naming authority.
</p>
</section>
<section id="assign-other-schemes">
<h5>
Other allocation schemes
</h5>
<p>
Some schemes use techniques other than delegated ownership to
avoid collision. For example, the specification for the data URL
(sic) scheme [[!RFC2397]] specifies that the resource identified
by a data scheme URI has only one possible representation. The
representation data makes up the URI that identifies that
resource. Thus, the specification itself determines how data URIs
are allocated; no delegation is possible.
</p>
<p>
Other schemes (such as "news:comp.text.xml") rely on a social
process.
</p>
</section>
</section>
<section id="indirect-identification">
<h4>
Indirect Identification
</h4>
<p>
To say that the URI "mailto:nadia@example.com" identifies both an
Internet mailbox and Nadia, the person, introduces a URI collision.
However, we can use the URI to indirectly identify Nadia.
Identifiers are commonly used in this way.
</p>
<p>
Listening to a news broadcast, one might hear a report on Britain
that begins, "Today, 10 Downing Street announced a series of new
economic measures." Generally, "10 Downing Street" identifies the
official residence of Britain's Prime Minister. In this context,
the news reporter is using it (as English rhetoric allows) to
indirectly identify the British government. Similarly, URIs
identify resources, but they can also be used in many constructs to
indirectly identify other resources. Globally adopted assignment
policies make some URIs appealing as general-purpose identifiers.
Local policy establishes what they indirectly identify.
</p>
<p>
Suppose that <code>nadia@example.com</code> is Nadia's email
address. The organizers of a conference Nadia attends might use
"mailto:nadia@example.com" to refer indirectly to her (e.g., by
using the URI as a database key in their database of conference
participants). This does not introduce a URI collision.
</p>
</section>
</section>
<section id="identifiers-comparison">
<h3>
URI Comparisons
</h3>
<p>
URIs that are identical, character-by-character, refer to the same
resource. Since Web Architecture allows the association of multiple
URIs with a given resource, two URIs that are not
character-by-character identical may still refer to the same
resource. Different URIs do not necessarily refer to different
resources but there is generally a higher computational cost to
determine that different URIs refer to the same resource.
</p>
<p>
To reduce the risk of a false negative (i.e., an incorrect conclusion
that two URIs do not refer to the same resource) or a false positive
(i.e., an incorrect conclusion that two URIs do refer to the same
resource), some specifications describe equivalence tests in addition
to character-by-character comparison. Agents that reach conclusions
based on comparisons that are not licensed by the relevant
specifications take responsibility for any problems that result; see
the section on <a class="addrefnb">error handling</a> for more
information about responsible behavior when reaching unlicensed
conclusions. Section 6 of [[!URI]] provides more information about
comparing URIs and reducing the risk of false negatives and
positives.
</p>
<p>
See also the <a>assertion that two URIs identify the same
resource</a>.
</p>
<section id="uri-aliases">
<h4>
URI aliases
</h4>
<p>
Although there are benefits (such as naming flexibility) to URI
aliases, there are also costs. URI aliases are harmful when they
divide the Web of related resources. A corollary of Metcalfe's
Principle (the "network effect") is that the value of a given
resource can be measured by the number and value of other resources
in its network neighborhood, that is, the resources that link to
it.
</p>
<p>
The problem with aliases is that if half of the neighborhood points
to one URI for a given resource, and the other half points to a
second, different URI for that same resource, the neighborhood is
divided. Not only is the aliased resource undervalued because of
this split, the entire neighborhood of resources loses value
because of the missing second-order relationships that should have
existed among the referring resources by virtue of their references
to the aliased resource.
</p>
<div class="boxedtext">
<p>
<span class="practicelab">Good practice: <a id=
"avoid-uri-aliases">Avoiding URI aliases</a></span>
</p>
<p class="practice">
A URI owner SHOULD NOT associate arbitrarily different URIs with
the same resource.
</p>
</div>
<p>
URI consumers also have a role in ensuring URI consistency. For
instance, when transcribing a URI, agents should not gratuitously
percent-encode characters. The term "character" refers to URI
characters as defined in section 2 of [[!URI]]; percent-encoding is
discussed in section 2.1 of that specification.
</p>
<div class="boxedtext">
<p>
<span class="practicelab">Good practice: <a id=
"lc-uri-chars">Consistent URI usage</a></span>
</p>
<p class="practice">
An agent that receives a URI SHOULD refer to the associated
resource using the same URI, character-by-character.
</p>
</div>
<p>
When a URI alias does become common currency, the <a href=
"#uri-ownership">URI owner</a> should use protocol techniques such
as server-side redirects to relate the two resources. The community
benefits when the URI owner supports redirection of an aliased URI
to the corresponding "official" URI. For more information on
redirection, see section 10.3, Redirection, in [[!RFC2616]]. See
also [[!CHIPS]] for a discussion of some best practices for server
administrators.
</p>
</section>
<section id="representation-reuse">
<h4>
Representation reuse
</h4>
<p>
URI aliasing only occurs when more than one URI is used to identify
the same resource. The fact that different resources sometimes have
the same representation does not make the URIs for those resources
aliases.
</p>
<div class="boxedtext">
<p>
<span class="storylab">Story</span>
</p>
<div class="story">
<p>
Dirk would like to add a link from his Web site to the Oaxaca
weather site. He uses the URI http://weather.example.com/oaxaca
and labels his link “report on weather in Oaxaca on
1 August 2004”. Nadia points out to Dirk that he is
setting misleading expectations for the URI he has used. The
Oaxaca weather site policy is that the URI in question
identifies a report on the current weather in Oaxaca—on any
given day—and not the weather on 1 August. Of course, on the
first of August in 2004, Dirk's link will be correct, but the
rest of the time he will be misleading readers. Nadia points
out to Dirk that the managers of the Oaxaca weather site do
make available a different URI permanently assigned to a
resource reporting on the weather on 1 August 2004.
</p>
</div>
</div>
<p>
In this story, there are two resources: “a report on the current
weather in Oaxaca” and “a report on the weather in Oaxaca on
1 August 2004”. The managers of the Oaxaca weather site
assign two URIs to these two different resources. On