Parsing websites: both from the point of view of the law is one of the most useful it tools for the world (and in Russia)?

image

Let's consider one of the best ways of gathering information in online parsing – from a legal point of view. Attention! This publication relates to some of the common legal issues related to parsing, but is not legal advice. Article is a continuation of the publication of "10 tools to parse information from web sites, including the prices of competitors + legal assessment of Russia"

Parsing is the automated process of extracting data from another web site. But it is necessary to figure out whether this is one of the most useful it tools for data collection or trap, leading to inevitable problems with the law? Parsing could certainly be one of the most perfect ways of production of content throughout the network, but attached is the disclaimer: this tool is very difficult to understand from a legal perspective. Parsing is the process by which the automated part of the software retrieves data from a web site, "combing" numerous pages. Search engines like Google and Bing do something similar when index web pages, and piringovye mechanisms go further and convert the information into a format that allows you to use these data to enter into database or spreadsheet.

Parsing is not the same as API. For example, a company can open up API access to allow other systems to interact with its data; the quality and quantity of data available through the API, as a rule, lower than can be obtained using parsing. In addition, the parsing provides more relevant information than through the API, and is much easier to setup from a structural point of view.

Scope "parkingowej" information is very numerous. A sports journalist can use parsing to examine baseball statistics for the article. Or, for example, in e-Commerce you can retrieve product names and their prices from different sources for later analysis (as an example in Russia — open service parsing and monitoring of competitor prices xmldatafeed.com).
image

But even though parsing and no doubt a powerful tool when it comes to legal issues, there may be difficulties. Because the process of initially parsing the existing content from different sources is assigned to those who have this tool uses appear ethical and legal difficulties.

To date, environment parsing, there is no clearly defined legal framework is a constant state of motion, but you can try to roughly outline the areas of greatest risk. The outline below describes the most striking cases of litigation that occurred in the United States, and became a precedent.

the

2000-2009: eBay


After the emergence of the parsing of legal problems for quite a while. But in 2000, the use of this tool has provoked a real battle – eBay made against the company to collect auction data Bidder''s Edge. EBay accused Bidder''s Edge in the illegal use of data extraction, referring to the Doctrine of trespass to movable property. The judge held the plaintiff, stating that the high activity of the robot programs could undermine the work of eBay.

Then in 2003 in the lawsuit of Intel versus Hamidi, the California Supreme court rejected the rationale that eBay used against Bidder''s Edge, ruling that the Doctrine of trespass to movable property may not be distributed in the computer environment, if not caused by the present damage of personal property.

All the earliest cases against parsing relied on the Doctrine of trespass to movable property and were successful plaintiffs. But this approach is no longer effective.
the

2009: Facebook


In 2009, Facebook sued on Power.com website that brings together various social networks into one centralized resource – when the last included Facebook in its service. Since Power.com was being parsed content on Facebook, rather than to adhere to established standards giant, Facebook sued on the basis of copyright infringement. The company has accused Facebook Power.com copying the web site Facebook is in the process of fetching information about users. Facebook argued that this process is a direct and indirect copyright infringement. The court's decision was in favor of Facebook, and since that time decisions on the legality of the parsing began to be taken in favor of the authors of content on the sites.

Even if the parser ignores the infringing content is in the process of searching publicly available information, his actions can be characterized as a copyright violation, because technically infringing content is still "copied".

the

2011-2014: Auernheimer


In 2010, hacker Andrew Auernheimer found a gap in the security system on the website of AT&T and extracted the email addresses of users who visited your website from their iPad. Taking advantage of the lack of security and parsing, Auernheimer able to access thousands of email addresses from the website AT&T. Auernheimer was convicted of unauthorized access to the server AT&T and assignment of other data.

The use of parsing to extract sensitive personal information can lead to prosecution, even if this information was nominally public. You can try to convince the court that no passwords, no codes there was no forced entry to gain access to information, however, this is dangerous territory.

the

2013: Meltwater


Meltwater – company-software developer, whose product is Global Media Monitoring uses parsing to gather news. The associated Press sued Meltwater for parsing articles, some of which were protected by copyright, and misappropriation of the news. Facts cannot be copyrighted, but the court decided that the articles themselves and the author's presentation of the facts to copy illegally. In addition, the use of articles a company Meltwater does not comply with standards. Copyright content is not always possible to parse!

the

2014: QVC


In 2014, QVC (well-known TV retailer) and Resultly (app store) sued due to the fact that QVC called "over-parsing". The charge QVC was that Resultly masked their search robots to hide the original IP address, so QVC could not block them parsers. Due to the fact that the bots were quite aggressive to the servers of QVC, there was an overload with the outage that resulted in damage of us $ 2 million. The court acquitted Resultly, ruling that intent to harm was not.

the

And that in Russia?


Let's start with the most simple and common question — taking pictures of price tags in stores, although it has no direct relation to the parsing of the sites, but the issues are similar (indeed, it seems that there is no difference to take pictures of price tags in stores, or you can parse prices from competitors ' websites).

So the question is: Can I install for customers, the rule prohibiting unauthorized photography or filming in the store? If you do not go into a detailed interpretation of the law, let's look at the most important article about information:

In accordance with article 5 of the Law "ON INFORMATION, INFORMATION TECHNOLOGIES AND PROTECTION of INFORMATION":

1. Information may be subject to public, civil and other legal relations. Information may be freely used by any person and transmitted by one person to another person, if Federal laws do not restrict access to information or other requirements to the order of its granting or distribution.


3. Information depending on the order of its presentation or distribution is divided into:

1) information is freely redistributable;
2) information provided under an agreement of persons participating in the relevant relations;
3) information which is in accordance with the Federal laws is subject to provision or dissemination;
4) information whose dissemination in the Russian Federation is restricted or prohibited.

4. Legislation of the Russian Federation can be set types of information depending on its content or owner. Thus, information about prices in the shops is public, because there is no legislation restricting access to such information. In connection with what to rewrite and remove the prices in the store are not prohibited.

Indeed, no violations of the law. Moreover, article 29 of the Constitution of the Russian Federation enshrines the right of every citizen to "freely seek, receive, transmit, produce and disseminate information by any lawful means."

Now parsing sites. The question we asked law company ("the mill and partners"): "whether the organization to implement the automated collection of information posted in open access on sites in a network the Internet (parsing)?"

In accordance with applicable Russian Federation law everything is permitted that is not prohibited by law. Parsing websites is legitimate, in that case, if its implementation do not violate established by the legislation bans. Thus, when the automated collection of information necessary to comply with applicable laws. The legislation of the Russian Federation establishes the following restrictions relating to the Internet

the
    the
  • Not allowed the violation of Copyright and related rights.
  • the
  • is Not allowed unlawful access to legally protected computer information.
  • the
  • Not allowed the collection of information constituting a commercial secret by illegal means.
  • the
  • is Not allowed deliberately dishonest exercise of civil rights (abuse of rights).
  • the
  • is Not allowed usage of civil rights for the purpose of restricting competition.

Of the above prohibitions, it follows that the organization may implement automated information gathering (parsing), placed in open access on sites on the Internet if you met of the following conditions:

the
    the
  • Information is in the public domain and not protected by legislation on copyright and related rights.
  • the
  • Automated data collection is carried out lawfully.
  • the
  • Automated information gathering leads to disruption in work sites on the Internet.
  • the
  • Automated the collection of information does not lead to restriction of competition.

There are recommendations, which should adhere to if you use the parsing:

the
    the
  • extract the content must not be copyrighted
  • the
  • the parsing Process should not interfere with the operation of the site, which is parsing a
  • the
  • Parsing should not violate the terms of use of website
  • the
  • the Parser should not extract personal (personal) user information
  • the
  • Content that is subjected to parsing, must meet the standards of fair use

p.s. The most "delicate" point is the possibility of claims that "parsing interferes with the operation of our website and we have a loss". In response to this claim can refer to the fact that search engines Google and Yandex are parsing (indexing) of the entire site and gather all available information, making it quite regularly. Accordingly, it seems logical to me that the same parser that comes to the website to gather pricing information, performs the same technical effect. To prove that the same action hinders the work site and job search engines does not interfere, can be difficult. But in any case, a good parser should follow the rules in robots.txt...
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

ODBC Firebird, Postgresql, executing queries in Powershell

garage48 for the first time in Kiev!

The Ministry of communications wants to ban phones without GLONASS