Технический аудит сайта при помощи screaming frog seo spider

Small Update – Version 13.2 Released 4th August 2020

We have just released a small update to version 13.2 of the SEO Spider. This release is mainly bug fixes and small improvements –

We first released custom search back in 2011 and it was in need of an upgrade. So we’ve updated functionality to allow you to search within specific elements, entire tracking tags and more. Check out our custom search tutorial.
Sped up near duplicate crawl analysis.
Google Rich Results Features Summary export has been ordered by number of URLs.
Fix bug with Near Duplicates Filter not being populated when importing a .seospider crawl.
Fix several crashes in the UI.
Fix PSI CrUX data incorrectly labelled as sec.
Fix spell checker incorrectly checking some script content.
Fix crash showing near duplicates details panel.
Fix issue preventing users with dual stack networks to crawl on windows.
Fix crash using Wacom tablet on Windows 10.
Fix spellchecker filters missing when reloading a crawl.
Fix crash on macOS around multiple screens.
Fix crash viewing gif in the image details tab.
Fix crash canceling during database crawl load.

Аудит XML Sitemap в Spider SEO Screaming Frog

Панель вебмастера гугл. регистрация и добавление сайта в google webmaster tools

Справа на вкладке Overviews, в разделе Sitemaps вы получите исчерпывающие данные о XML-карте сайта:

В результате мы получим данные по следующим фильтрам:

URLs in Sitemap — веб-страницы, которые находятся на сайте и добавлены в XML-карту сайта. Сюда должны входить только оптимизированные канонические веб-страницы, открыты для индексации;
URLs Not in Sitemap — веб-страницы, которые доступны на сайте, но не добавлены в XML-карту сайта. Например, скрытые от поиска страницы тегов и авторов в CMS WordPress;
Orphan URLs — веб-страницы, которые доступны только в XML-карте сайта, но не проиндексированы поисковым ботом. Является ошибкой поисковой оптимизации;
Non-Indexable URLs in Sitemap — веб-страницы, которые доступны в XML-карте сайта, но закрыты для поиска. Аналогично является ошибкой, т.к. карта сайта Sitemap не должна содержать страниц, закрытых от индекса;
URLs In Multiple Sitemaps — веб-страницы, которые доступны в нескольких XML-картах одновременно. Как правило, веб-страница должна находится только в одной карте сайта;
XML Sitemap With Over 50k URLs — показывает наличие крупніх XML-карт сайта с более 50 тыс. страниц;
XML Sitemap With Over 50mb — аналогично, только с размером 50 Мб.

Приведенные выше фильтры помогут убедиться, что в XML Sitemap включены только качественные индексируемые канонические URL. Поисковые системы плохо переносят «грязь» в XML-файлах Sitemap, например, в тех, которые содержат ошибки, перенаправления или неиндексируемые URL-адреса. Таким сайтам поисковики доверяют при сканировании и индексировании

Поэтому важно поддерживать работоспособность всех веб-страниц, которые попадают в XML-файл

Также есть возможность просмотреть XML-карту сайта в режиме списка со множеством фильтров и показателей:

Список можете выгрузить с помощью кнопки «Export» в формате .xls. Присутствует режим просмотра в виде дерева каталогов:

Если выбрать из списка URL, то с помощью вкладки Inlinks можно посмотреть страницу донор и анкор:

Экспортировать все Inlinks в Excel можно с помощью меню Bulk Export -> Sitemaps:

5) Aggregated Site Structure

Еженедельный технический анализ финансовых рынков

The SEO Spider now displays the number of URLs discovered in each directory when in directory tree view (which you can access via the tree icon next to ‘Export’ in the top tabs).

This helps better understand the size and architecture of a website, and some users find it more logical to use than traditional list view.

Alongside this update, we’ve improved the right-hand ‘Site Structure’ tab to show an aggregated directory tree view of the website. This helps quickly visualise the structure of a website, and identify where issues are at a glance, such as indexability of different paths.

If you’ve found areas of a site with non-indexable URLs, you can switch the ‘view’ to analyse the ‘indexability status’ of those different path segments to see the reasons why they are considered as non-indexable.

You can also toggle the view to crawl depth across directories to help identify any internal linking issues to areas of the site, and more.

This wider aggregated view of a website should help you visualise the architecture, and make better decisions for different sections and segments.

What Is SEO?

Как удалить битые ссылки с сайта wordpress?

Search Engine Optimisation (SEO) is the practice of increasing the number and quality of visitors to a website by improving rankings in the algorithmic search engine results.

Research shows that websites on the first page of Google receive almost 95% of clicks, and studies show that results that appear higher up the page receive an increased click through rate (CTR), and more traffic.

The algorithmic (‘natural’, ‘organic’, or ‘free’) search results are those that appear directly below the top pay-per-click adverts in Google, as highlighted below.

There are also various other listings that can appear in the Google search results, such as map listings, videos, the knowledge graph and more. SEO can include improving visibility in these result sets as well.

1) SERP Snippets Now Editable

First of all, the SERP snippet tool we released in our previous version has been updated extensively to include a variety of new features. The tool now allows you to preview SERP snippets by device type (whether it’s desktop, tablet or mobile) which all have their own respective pixel limits for snippets. You can also bold keywords, add rich snippets or description prefixes like a date to see how the page may appear in Google.

You can read more about this update and changes to pixel width and SERP snippets in Google in our new blog post.

The largest update is that the tool now allows you to edit page titles and meta descriptions directly in the SEO Spider as well. This subsequently updates the SERP snippet preview and the table calculations letting you know the number of pixels you have before a word is truncated. It also updates the text in the SEO Spider itself and will be remembered automatically, unless you click the ‘reset title and description’ button. You can make as many edits to page titles and descriptions and they will all be remembered.

This means you can also export the changes you have made in the SEO Spider and send them over to your developer or client to update in their CMS. This feature means you don’t have to try and guesstimate pixel widths in Excel (or elsewhere!) and should provide greater control over your search snippets. You can quickly filter for page titles or descriptions which are over pixel width limits, view the truncations and SERP snippets in the tool, make any necessary edits and then export them. (Please remember, just because a word is truncated it does not mean it’s not counted algorithmically by Google).

Small Update – Version 3.3 Released 23rd March 2015

We have just released another small update to version 3.3 of the Screaming Frog SEO Spider. Similar to the above, this is just a small release with a few updates, which include –

Fixed a relative link bug for URLs.
Updated the right click options for ‘Show Other Domains On This IP’, ‘Check Index > Yahoo’ and OSE to a new address.
CSV files now don’t include a BOM (Byte Order Mark). This was needed before we had excel export integration. It causes problems with some tools parsing the CSV files, so has been removed, as suggested by Kevin Ellen.
Fixed a couple of crashes when using the right click option.
Fixed a bug where images only linked to via an HREF were not included in a sitemap.
Fixed a bug effecting users of 8u31 & JDK 7u75 and above trying to connect to SSLv3 web servers.
Fixed a bug with handling of mixed encoded links.

You can download the SEO Spider 3.3 now.

Thanks to everyone for all their comments on the latest version and feeback as always.

1) Tree View

You can now switch from the usual ‘list view’ of a crawl, to a more traditional directory ‘tree view’ format, while still mantaining the granular detail of each URL crawled you see in the standard list view.

This additional view will hopefully help provide an alternative perspective when analysing a website’s architecture.

The SEO Spider doesn’t crawl this way natively, so switching to ‘tree view’ from ‘list view’ will take a little time to build, & you may see a progress bar on larger crawls for instance. This has been requested as a feature for quite sometime, so thanks to all for their feedback.

Export & Sort

After a coffee/nap/cat-vid, you should hopefully come back to a 100% completed crawl with every page speed score you could hope for.

Navigate over to the custom extraction tab (Custom > Filter > Extraction) and hit export to download it all into a handy .xls spreadsheet.

Once the export is open in Excel hit the find and replace option and replace https://developers.google.com/speed/pagespeed/insights/?url= with nothing. This will bring back all your URLs in the original order alongside all their shiny new speed scores for mobile and desktop.

After a tiny bit of formatting you should end up with a spreadsheet that looks something like this:

Bonus

What I find particularly powerful is the ability to combine this data with other metrics the spider can pull through in a separate crawl. As list mode exports in the same order its uploaded in, you can run a normal list mode crawl with your original selection of URLs connected to any API, export this and combine with your PSI scores.
Essentially allowing you to make an amalgamation of session data, PSI scores, response times, GA trigger times alongside any other metrics you want!

2) Поиск некачественных title и description

Уникальные метаданные важны для SEO. Ключевые слова должны быть правильно употреблены в теге title и мета-описании. Хороший заголовок должен иметь длину не более 60 символов, а ключевое слово должно использоваться в его начале. Для мета-описания рекомендуется длина 160 символов. SEO Frog Spider работает так же, как профессионалы поисковой оптимизации. Инструмент исправляет тайтлы и описания, которые слишком длинны и неприемлемы для поисковой системы. Он отдельно отображает результаты по заголовкам: URL-адрес, вхождения, длина и содержимое. Затем вы сможете устранить выявленные ошибки и внести необходимые исправления.

Интерфейс

Итак, всё начинается с поля «Enter URL to spider», куда вводится название сайта и жмётся кнопка «Start».

Понятное дело, что она запускает сканирование сайта, а когда оно завершится, мы можем приступать к анализу. И тут мы сразу получаем первые минусы по сравнению с PageWeight — нельзя задать локальный (то есть свой) роботс. В принципе, исключить разделы из сканирования можно через Configuration — Exclude, но это уже не так удобно. Впрочем, познакомимся с интерфейсом и возможностями программы.
Сначала может ввести в ступор, что в списке страниц будет куча изображений, но их можно моментально отсечь — либо выбрав фильтр HTML (кстати кнопка Export отвечает за экспорт текущих результатов из главного окна в Excel, можно даже в xlsx):

Либо переключившись на HTML в сайдбаре, оба варианта оставят в основном окне программы только собственно HTML страницы:

В последней версии (3.0 на момент написания статьи) появилась возможность выстроить структуру сайта. Таким образом можно, к примеру, изучить структуру конкурентов перед созданием своего сайта.

Заметьте, что у каждой вкладки программы есть свои собственные фильтры. Вы можете выбрать, например, только страницы, отдающие 301 редирект и выгрузить их в Excel. На вкладке URI вы можете выбрать урлы, чья длина больше 115 символов, урлы с нижним подчеркиванием вместо дефиса (фильтр Underscores), дубли страниц (Duplicate), урлы с параметрами (Dynamic). На вкладке Title — выбрать те тайтлы, чья длина больше 65 символов или меньше 30, совпадающие с H1 на странице, отсутствующие. На вкладке Images — все изображения больше 100 килобайт, без тега alt. Ну и так далее.

Столбцы в основном окне перемещаются по принципу Drag and Drop, так что можно переместить наиболее важные из них ближе к левой части окна и сохранить настройки через File — Default Config — Save Current.
При нажатии на название столбца происходит сортировка. Среди столбцов есть не совсем обычные:

Title 1 Lenght — длина Title
Title 1 Pixel Width — ширина Title в пикселях
Level — это уровень вложенности.
Word Сount — количество слов между тегами body.
Size — вес страницы в байтах.
Inlinks — количество внутренних ссылок на страницу.
Outlinks — количество внутренних ссылок со страницы.
External Outlinks — количество внешних ссылок со страницы. Поспорьте с пацанами, кто угадает, какое наибольшее количество ссылок со страницы размещает тот или иной сапа-сайт. Если один угадает, а второй нет — то второй покупает ссылку на свой сайт с этой страницы.
Response Time — время загрузки страницы.

Также внизу есть окно с более подробной информацией о странице. Так, SERP Snippet показывает, как, по оценке программы, будет выглядеть сниппет в Google. Полезно, если вы заморачиваетесь, чтобы Title в выдаче выглядел кликабельнее.

Когда в окне кликаете правой кнопкой мыши на строку нужного урла, открывается контекстное меню, из которого наиболее важным пунктом является Open in Browser — открыть в браузере.

Также удобно будет выделить с помощью shift часть урлов и удалить их через Remove. Не с сайта, конечно, а из результатов сканирования. А то бы я давно с пары сайтов кое-каких чертей урлы бы поудалял…

Также с помощью контекстного меню можно проверить наличие страницы в индексе Гугла, Яху и Бинга, посмотреть бэклинки через сервисы типа Majestic SEO или Ahrefs, открыть кэш Гугла или найти страницу в Вебархиве. Ну еще роботс глянуть и проверить код страницы на наличие ошибок. Контекстное меню на всех вкладках одинаковое.

Debugging in Chrome Developer Tools

Chrome is the definitive king of browsers, and arguably one of the most installed programs on the planet. What’s more, it’s got a full suite of free developer tools built straight in—to load it up, just right-click on any page and hit inspect. Among many aspects, this is particularly handy to confirm or debunk what might be happening in your crawl versus what you see in a browser.

For instance, while the Spider does check response headers during a crawl, maybe you just want to dig a bit deeper and view it as a whole? Well, just go to the Network tab, select a request and open the Headers sub-tab for all the juicy details:

Perhaps you’ve loaded a crawl that’s only returning one or two results and you think JavaScript might be the issue? Well, just hit the three dots (highlighted above) in the top right corner, then click settings > debugger > disable JavaScript and refresh your page to see how it looks:

Or maybe you just want to compare your nice browser-rendered HTML to that served back to the Spider? Just open the Spider and enable ‘JavaScript Rendering’ & ‘Store Rendered HTML’ in the configuration options (Configuration > Spider > Rendering/Advanced), then run your crawl. Once complete, you can view the rendered HTML in the bottom ‘View Source’ tab and compare with the rendered HTML in the ‘elements’ tab of Chrome.

There are honestly far too many options in the Chrome developer toolset to list here, but it’s certainly worth getting your head around.

5) Internal Link Score

A useful way to evaluate and improve internal linking is to calculate internal PageRank of URLs, to help get a clearer understanding about which pages might be seen as more authoritative by the search engines.

The SEO Spider already reports on a number of useful metrics to analyse internal linking, such as crawl depth, the number of inlinks and outlinks, the number of unique inlinks and outlinks, and the percentage of overall URLs that link to a particular URL. To aid this further, we have now introduced an advanced ‘link score’ metric, which calculates the relative value of a page based upon its internal links.

This uses a relative 0-100 point scale from least to most value for simplicity, which allows you to determine where internal linking might be improved.

The link score metric algorithm takes into consideration redirects, canonicals, nofollow and much more, which we will go into more detail in another post.

This is a relative mathematical calculation, which can only be performed at the end of a crawl when all URLs are known. Previously, every calculation within the SEO Spider has been performed at run-time during a crawl, which leads us on to the next feature.

3) The Crawl Path Report

We often get asked how the SEO Spider discovered a URL, or how to view the ‘in links’ to a particular URL. Well, generally the quickest way is by clicking on the URL in question in the top window and then using the ‘in links’ tab at the bottom, which populates the lower window pane (as discussed in our guide on finding broken links).

But, sometimes, it’s not always that simple. For example, there might be a relative linking issue, which is causing infinite URLs to be crawled and you’d need to view the ‘in links’ of ‘in links’ (of ‘in links’ etc) many times, to find the originating source. Or, perhaps a page wasn’t discovered via a HTML anchor, but a canonical link element.

This is where the ‘crawl path report’ is very useful. Simply right click on a URL, go to ‘export’ and ‘crawl path report’.

You can then view exactly how a URL was discovered in a crawl and it’s shortest path (read from bottom to top).

Simple.

Small Update – Version 9.2 Released 27th March 2018

We have just released a small update to version 9.2 of the SEO Spider. Similar to 9.1, this release addresses bugs and includes some small improvements as well.

Speed up XML Sitemap generation exports.
Add ability to cancel XML Sitemap exports.
Add an option to start without initialising the Embedded Browser (Configuration->System->Embedded Browser). This is for users that can’t update their security settings, and don’t require JavaScript crawling.
Increase custom extraction max length to 32,000 characters.
Prevent users from setting database directory to read-only locations.
Fix switching to tree view with a search in place shows “Searching” dialog, forever, and ever.
Fix incorrect inlink count after re-spider.
Fix crash when performing a search.
Fix project saved failed for list mode crawl with hreflang data.
Fix crash when re-spidering in list mode.
Fix crash in ‘Bulk Export > All Page Source’ export.
Fix webpage cut off in screenshots.
Fix search in tree view, while crawling doesn’t keep up to date.
Fix tree view export missing address column.
Fix hreflang XML sitemaps missing namespace.
Fix needless namespaces from XML sitemaps.
Fix blocked by Cross-Origin Resource Sharing policy incorrectly reported during JavaScript rendering.
Fix crash loading in large crawl in database mode.

1) Dark Mode

While arguably not the most significant feature in this release, it is used throughout the screenshots – so it makes sense to talk about first. You can now switch to dark mode, via ‘Config > User Interface > Theme > Dark’.

Not only will this help reduce eye strain for those that work in low light (everyone living in the UK right now), it also looks super cool – and is speculated (by me now) to increase your technical SEO skills significantly.

The non-eye-strained among you may notice we’ve also tweaked some other styling elements and graphs, such as those in the right-hand overview and site structure tabs.

4) Tree-View Export

If you didn’t already know, you can switch from the usual ‘list view’ of a crawl to a more traditional directory ‘tree view’ format by clicking the tree icon on the UI.

However, while you were able to view this format within the tool, it hasn’t been possible to export it into a spreadsheet. So, we went to the drawing board and worked on an export which seems to make sense in a spreadsheet.

When you export from tree view, you’ll now see the results in tree view form, with columns split by path, but all URL level data still available. Screenshots of spreadsheets generally look terrible, but here’s an export of our own website for example.

This allows you to quickly see the break down of a website’s structure.

3) Input Your Syntax

Next up, you’ll need to input your syntax into the relevant extractor fields. A quick and easy way to find the relevant CSS Path or Xpath of the data you wish to scrape, is to simply open up the web page in Chrome and ‘inspect element’ of the HTML line you wish to collect, then right click and copy the relevant selector path provided.

For example, you may wish to start scraping ‘authors’ of blog posts, and number of comments each have received. Let’s take the Screaming Frog website as the example.

Open up any blog post in Chrome, right click and ‘inspect element’ on the authors name which is located on every post, which will open up the ‘elements’ HTML window. Simply right click again on the relevant HTML line (with the authors name), copy the relevant CSS path or XPath and paste it into the respective extractor field in the SEO Spider. If you use Firefox, then you can do the same there too.

You can rename the ‘extractors’, which correspond to the column names in the SEO Spider. In this example, I’ve used CSS Path.

The ticks next to each extractor confirm the syntax used is valid. If you have a red cross next to them, then you may need to adjust a little as they are invalid.

When you’re happy, simply press the ‘OK’ button at the bottom. If you’d like to see more examples, then skip to the bottom of this guide.

Please note – This is not the most robust method for building CSS Selectors and XPath expressions. The expressions given using this method can be very specific to the exact position of the element in the code. This is something that can change due to the inspected view being the rendered version of the page / DOM, when by default the SEO Spider looks at the HTML source, and HTML clean-up that can occur when the SEO Spider processes a page where there is invalid mark-up.

These can also differ between browser, e.g. for the above ‘author’ example the following CSS Selectors are given –

Chrome: body > div.main-blog.clearfix > div > div.main-blog–posts > div.main-blog–posts_single–inside_author.clearfix.drop > div.main-blog–posts_single–inside_author-details.col-13-16 > div.author-details–social > aFirefox: .author-details–social > a:nth-child(1)

The expressions given by Firefox are generally more robust than those provided by Chrome. Even so, this should not be used as a complete replacement for understanding the various extraction options and being able to build these manually by examining the HTML source.

The w3schools guide on CSS Selectors and their XPath introduction are good resources for understanding the basics of these expressions.

4) Configurable Accept-Language Header

Google introduced local-aware crawl configurations earlier this year for pages believed to adapt content served, based on the request’s language and perceived location.

This essentially means Googlebot can crawl from different IP addresses around the world and with an Accept-Language HTTP header in the request. Hence, like Googlebot, there are scenarios where you may wish to supply this header to crawl locale-adaptive content, with various language and region pairs. You can already use the proxy configuration to change your IP as well.

You can find the new ‘Accept-Language’ configuration under ‘Configuration > HTTP Header > Accept-Language’.

We have some common presets covered, but the combinations are huge, so there is a custom option available which you can just set to any value required.

2) Spelling & Grammar

If you’ve found yourself with extra time under lockdown, then we know just the way you can spend it (sorry).

You’re now also able to perform a spelling and grammar check during a crawl. The new ‘Content’ tab has filters for ‘Spelling Errors’ and ‘Grammar Errors’ and displays counts for each page crawled.

You can enable spelling and grammar checks via ‘Config > Content > Spelling & Grammar‘.

While this is a little different from our usual very ‘SEO-focused’ features, a large part of our roles are about improving websites for users. Google’s own search quality evaluator guidelines outline spelling and grammar errors numerous times as one of the characteristics of low-quality pages (if you need convincing!).

The lower window ‘Spelling & Grammar Details’ tab shows you the error, type (spelling or grammar), detail, and provides a suggestion to correct the issue.

The right-hand-side of the details tab also shows you a visual of the text from the page and errors identified.

The right-hand pane ‘Spelling & Grammar’ tab displays the top 100 unique errors discovered and the number of URLs it affects. This can be helpful for finding errors across templates, and for building your dictionary or ignore list.

The new spelling and grammar feature will auto-identify the language used on a page (via the HTML language attribute), but also allow you to manually select language where required. It supports 39 languages, including English (UK, USA, Aus etc), German, French, Dutch, Spanish, Italian, Danish, Swedish, Japanese, Russian, Arabic and more.

You’re able to ignore words for a crawl, add to a dictionary (which is remembered across crawls), disable grammar rules and exclude or include content in specific HTML elements, classes or IDs for spelling and grammar checks.

You’re also able to ‘update’ the spelling and grammar check to reflect changes to your dictionary, ignore list or grammar rules without re-crawling the URLs.

As you would expect, you can export all the data via the ‘Bulk Export > Content’ menu.

Результаты тестирования 22 веб-краулеров

После многочисленных тестов нами были получены такие результаты:

Программа	Время на скан 100 страниц	Время на скан 1 000 страниц	Время на скан 10 000 страниц	Время на скан 100 000 страниц	Широкий набор аудируемых параметров	Гибкая фильтрация данных	Сканирование произвольных URL	Расчет Page Rank	Визуализация данных на графе	Freeware
Screaming Frog SEO Spider	0:00:08	0:00:45	0:05:35	1:03:30	+	—	+	—	+	—
Netpeak Spider	0:00:04	0:00:30	0:04:53	0:55:11	+	+	+	+	+	—
SiteAnalyzer	0:00:06	0:00:22	0:06:47	2:04:36	+	+	+	+	+	+
Forecheck	0:00:15	0:01:12	0:08:02	1:36:14	+	—	+	—	—	—
Sitebulb	0:00:08	0:01:26	0:16:32	2:47:54	+	—	+	—	+	—
WebSite Auditor	0:00:07	0:00:40	0:05:56	2:36:26	+	—	—	—	+	—
Comparser	0:00:12	—	—	—	—	—	+	—	—	—
Visual SEO Studio	0:00:15	0:02:24	0:24:14	4:08:47	—	—	—	—	—	—
Xenu	0:00:12	0:01:22	0:14:41	2:23:32	—	—	—	—	—	+
Darcy SEO Checker	0:00:04	0:00:31	0:05:40	0:58:45	—	—	—	—	—	—
LinkChecker	0:00:29	0:00:52	0:03:22	0:52:04	—	—	—	—	—	+
PageWeight Desktop	0:00:06	0:00:56	0:17:40	4:23:15	—	—	—	+	—	—
Beam Us Up	0:00:08	0:01:03	0:10:18	1:43:03	—	—	—	—	—	+
Webbee	0:00:10	0:01:58	—	—	—	—	—	—	—	—
WildShark SEO spider	0:00:28	0:07:20	—	—	—	—	—	—	—	+
Site Visualizer	0:00:11	0:01:58	0:38:15	—	—	—	—	—	—	—
RiveSolutions SEO Spider	0:00:06	0:00:49	0:08:14	1:55:19	—	—	—	—	—	—
IIS SEO Toolkit	0:00:03	0:00:46	0:07:08	1:02:26	—	—	—	—	—	+
Website Link Analyzer	0:00:09	0:02:38	0:24:56	4:33:41	—	—	—	—	—	+
A1 Website Analyzer	0:00:24	0:05:32	0:53:15	8:42:11	—	—	—	—	—	+
seoBOXX WebsiteAnalyser	0:00:12	0:01:15	0:17:31	3:51:08	—	—	—	—	—	—
Smart SEO Auditor	0:04:46	—	—	—	—	—	—	—	—	—

Примечание: на сканировании 100 и 1 000 страниц нет смысла сильно заострять внимание в виду разницы алгоритмов обхода краулерами страниц у разных программ. А вот скорость сканирования 10 000 и 100 000 страниц уже показательна, так как отражает более-менее стабильную скорость работы краулеров на дальней дистанции

Windows

Open a command prompt (Start button, then type ‘cmd’ or search programs and files for ‘Windows Command Prompt’). Move into the SEO Spider directory (64-bit) by entering:

cd "C:\Program Files (x86)\Screaming Frog SEO Spider"

Or for 32-bit:

cd "C:\Program Files\Screaming Frog SEO Spider"

On Windows, there is a separate build of the SEO Spider called ScreamingFrogSEOSpiderCli.exe (rather than the usual ScreamingFrogSEOSpider.exe). This can be run from the Windows command line and behaves like a typical console application.

You can type ScreamingFrogSEOSpiderCli.exe –help to view all arguments and see all logging come out of the CL.

To open a saved crawl:

ScreamingFrogSEOSpider.exe C:\Temp\crawl.seospider

To auto start a crawl:

ScreamingFrogSEOSpiderCli.exe --crawl https://www.example.com

Then additional arguments can merely be appended with a space.

For example, the following will mean the SEO Spider runs headless, saves the crawl, outputs to your desktop and exports the internal and response codes tabs, and client error filter.

ScreamingFrogSEOSpiderCli.exe --crawl https://www.example.com --headless --save-crawl --output-folder "C:\Users\Your Name\Desktop" --export-tabs “Internal:All,Response Codes:Client Error (4xx)”

Please see the full list of available to supply as arguments for the SEO Spider.

Заключение

Давайте подрезюмируем, что же выделяет Netpeak Spider на фоне множества других программ:

Фокус на устранении SEO-ошибок → для них отведён отдельный отчёт с группировкой по критичности и возможностью быстрой выгрузки всех ошибок в несколько кликов для дальнейшей работы с разработчиками и контент-менеджерами.
SEO-аудит в PDF + white label → брендирование отчёта и добавление контактных данных для быстрого проведения экспресс-аудитов и заключения новых контрактов по предоставлению услуги SEO.
Интеграция с Google Analytics, Search Console и Яндекс.Метрикой → находите новые инсайты с помощью обогащения результатов сканирования аналитическими данными и поисковыми запросами: такое количество полезных данных нужно ещё поискать на рынке!
Сегментация данных → уникальная функция, позволяющая абсолютно по-новому взглянуть на сайт. Однажды я просканировал сайт и ещё неделю его крутил-вертел, находя всё новые и новые точки роста.
Полная кастомизируемость → используйте параметры и настройки на максимум: всё в ваших руках!
Продвинутая таблица результатов → самая мощная на рынке с внушительными функциями: сортировка, группировка, фильтрация, быстрый поиск и подстраивание таблицы под свои нужды.
Встроенные инструменты → анализ исходного кода и HTTP-заголовков, расчёт внутреннего PageRank, валидатор и генератор Sitemap: больше не нужно покидать программу, чтобы решать самые разносторонние задачи.
И, как вишенка на торте, оптимальное потребление ресурсов компьютера → весь этот функционал использует минимум памяти, а внутренняя база данных позволяет работать с огромным количеством URL.

P.S. У нас также есть второй продукт — Netpeak Checker, по которому мы подготовили аналогичную статью. Приглашаю вас посмотреть видео и почитать о нём, ну а в конце, конечно, пройти тест!